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INFORMATION THEORY FOR MATHEMATICIANS ' 
By J. Wo.row1tz 
Cornell University 


A more descriptive term for information theory and one preferred by the 
present writer is ‘‘the theory of coding of messages.” In this expository note we 
will describe briefly some basic concepts of this theory when transmission is 
through a “noisy channel” (noise = chance errors). We shall assume that both 
the transmitting alphabet and the receiving alphabet consist of two symbols, 
0 and 1, say. This represents no loss in generality because the extension to any 
other alphabet, say one of twenty-six symbols, is immediate and presents no 
difficulty at all. 

The fundamental paper of the theory is [1]; other important papers are [2], 
[3], [4], and [5]. The papers most easily intelligible to the mathematician are 
probably [3], [4], [7], and [8]. The latter three deal with the subject matter of the 
present paper; [4] and [7] may each be read without any prior reading, and [8] 
is a sequel of [7]. In the present paper we describe four theorems proved in [7] 
and [8] and their relation to prior results. 

Suppose that a person has a vocabulary of S words, any of which he may want 
to transmit, in any frequency and in any order, over some channel. We emphasize 
that we do not assume anything about the frequency with which particular words 
are transmitted, nor that the words to be transmitted are selected by any random 
process; in this respect our treatment differs from most of those in the literature. 

Let the words be numbered in some fixed but arbitrary manner. Then trans- 
mitting a word is equivalent to transmitting one of the integers 1, 2,--- , S. 
Let s = log S (all logarithms in this paper are to the base 2). Then there are S 
sequences of s elements each’, each element either 0 or 1. If there is no noise, 
i.e., error of transmission, then, to transmit any word one has only to transmit 
the appropriate sequence of s zeros or ones. 

If there is noise then this is clearly not enough, for the transmitted sequence 
will usually be incorrectly received. What is needed is that the received sequence, 
which will usually be a moderately garbled version of the transmitted sequence, 
should still be different from the moderately garbled version of any other trans- 
mitted sequence, so that one can infer what sequence it is that has been trans- 
mitted. But this requires that the sequences to be sent be not too similar in some 
reasonable sense, lest they be confused in transmission. Hence one must employ 
sequences of length greater than s, and not all such sequences (so that ‘“‘neigh- 
boring” sequences be not sent). All these remarks will now be made precise. 

Let theinteger m (= 0) bethe‘“‘memory’”’. A sequence of n (respectively (n — m), 


Received October 31, 1957. 

1 Rietz lecture delivered (under a different title) at the Atlantic City meeting of the 
Institute of Mathematical Statistics on September 10, 1957, by invitation of the Council 
of the Institute. Work under contract with the Office of Naval Research. 

? Obviously, if s is not an integer one should replace it by the smallest integer 2 s 
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(m + 1)) elements, each zero or one, will be called an z-sequence (resp., a 
y-sequence, an a-sequence).’ A transmitted sequence (received sequence) is al- 
ways an 2-sequence (y-sequence). There is given a “channel probability 
function” p, defined in the domain of all a-sequences, such that, for any a- 
sequence a, 0 S p(a) S 1. The “noisy channel” transmits an x-sequence x as 
follows: Let a be the a-sequence of the first (m + 1) elements of x. The 
channel ‘‘performs”’ a chance experiment with one of two possible outcomes, 1 
and 0, with respective probabilities p(a:) and 1 — p(a;). The outcome of the ex- 
periment is the first element of the received sequence Y(x). Let a2 be the a- 
sequence of the 2nd, 3rd, --- , (m + 2)th elements of x. The channel now per- 
forms a chance experiment, independent of the first, with possible outcomes 1 
and 0 and respective probabilities p(a2) and 1 — p(ae). This is repeated until 
(n — m) independent experiments have been performed. The probabilities of the 
outcomes one and zero in the 7th experiment are p(a;) and 1 — p(a;,), respec- 
tively, where a; is the a-sequence of the ith, (¢ + 1)st, --- , (¢ + m)th elements 
of x. The received sequence ¥ (x) is a chance y-sequence made up of the outcomes 
of the experiments in consecutive order. Let y, be any y-sequence. If 
P{Y¥(x) = ym} > 0 (the symbol P {| } denotes the probability of the relation in 
braces) then y is called a possible received sequence when 2 is transmitted. 
Let 4,0 <A <1, be a given number. A “code” of length ¢ is a set 
isi, Ay, *t = 


1, ---, ¢} where each 2; is an x-sequence, each A; is a set of y- 
sequences, the A; are all disjoint, and for each 7,7 = 1, --- , 4, 


P{Y(z,;)¢A:} 21 —-.X. 


To be able to transmit S words we need a code of length S. The practical ap- 
plication of a code is as follows: When one wishes to transmit the 7th word one 
transmits the x-sequence z,. Whenever the receiver receives a y-sequence which 
is in A; , he always concludes that the jth word has been sent. When the receiver 
receives a y-sequence not in Ayu A2:u --- u A; he may draw any conclusion he 
wishes about the word that has been sent. The probability that any word trans- 
mitted will be incorrectly received is < X. 

The quantity (1/n) log ¢ iscalled the rate of transmission. The practical advan- 
tages of a high rate of transmission are obvious. In this paper we shall be con- 
cerned with the problem of determining, or at least bounding, the highest possible 
rate of transmission. 

If p(a:) = plaz), then the two a-sequences a and a are indistinguishable in 
transmission. Barring such cases for simplicity, then, whatever be 4,0 < A < 1, 
it is always possible to find an nm and then a code of length S, provided one is 
willing to transmit at a sufficiently small rate. By sufficient repetition of the word 
to be transmitted one can insure that the probability of its correct reception 
exceeds 1 — .* 


3 These terms are used only in {7} and [S} 

4 For example, ‘‘estimation”’ of the word transmitted may be by the method of maximum 
likelihood. The words in the vocabulary are the possible ‘‘values’’ of the parameter to be 
estimated. Since there are only finitely many words in the vocabulary the method of maxi- 
mum likelihood is uniformly consistent. 
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What we have called a code in the present paper is usually called ‘‘an error 
correcting” code® in the literature of coding theory. The latter often admits as 
codes systems which do not meet the definition of code given above. Much of the 
literature of coding theory is concerned with the situation where the words 
to be transmitted are chosen from the vocabulary by a chance process with known 
distribution. Without discussing this matter further here we invite the reader to 
verify that the results cited below about the existence of (error correcting) codes 
of certain lengths hold a fortiori when the words to be transmitted are chosen by 
il chance process. 


Let M, be the class of all stationary, metrically transitive stochastic processes 


Xi, X2,X3,°°- 


where the chance variables X; can take only the values 0 and 1. Let M, be the 
subclass of M; in which the X; constitute a Markov chain. Let Mo be the sub- 
class of M, in which the X; are independently distributed. We shall shortly define 
a functional ¢ on every member of M; (more precisely, ¢ will be a functional of 
the distribution functions of the stochastic processes). In the meantime, let 
C,,C,,Co, be, respectively, the supremum of ¢ over M2 , M, , Mo, respectively 
Then, of course, Co S C; S C2. 

Let « always be an arbitrary positive number. The following Theorem A was 
first proved by Shannon [1] for the situation when the words to be transmitted 
are chosen by a known random process and for, in general, not error correcting 
codes 


THEOREM A. For sufficiently large n there extsts a code of length 


2" 
(In {1} Shannon not only proved this remarkable theorem but brilliantly laid the 
foundations of the whole subject). Basing himself on the ingenious and im- 
portant work of Feinstein [2] and McMillan [5], Khintchine in a very important 
paper [4] rigorously proved THrorem B. For sufficiently large n there exists a code 
of lenath 


While Khintchine’s paper does not explicitly treat error correcting codes, one can 
deduce from his proof that Theorem B holds for error correcting codes. 

Theorem B obviously implies Theorem A (both for error correcting codes). 
The question arises whether Theorem B is stronger than Theorem A, i.e., 
whether C; < C; . For m = 0 we will see below that the answer is in the negative 
For general m the subject is under investigation. 

In [7] (Theorem 3) the present author gave an extremely simple and very 
much briefer proof of Theorem B. Using essentially the same simple methods he 
proved the following improvement on Theorem A. 


5 More about error correcting codes in, for example, [6] 
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THeoremM | of [7]: For any n there exists a code of le ngth 


ynrei—Ky 


where Ky, is a positive constant’ which does not depend on n. 

We next concern ourselves with the important and interesting question of an 
upper bound for the length of an (error correcting) code. For the codes con- 
sidered by Shannon the latter stated’ ({1]) that there cannot exist a code of 
length greater than 


cpn(Co+e 


Shannon gave a proof to which all others in the literature refer. Khintchine [4] 
pointed out that neither the argument of [1] nor any of the arguments to be found 
in the literature constitute a proof or even the outline of a proof; he also pointed 
out the desirability of proving the result and mentioned some of the difficulties. 

In {7| (Theorem 2) the author proved the following theorem: When m = 0 
there is a positive constant’ Ky such that there cannot exist an (error correcting) 
code of length greater than 


An immediate consequence of this theorem is that, when m = 0, Cp = Ci = C2. 
Hence, when m = 0, Theorem B adds nothing to Theorem A, and both are weaker 
than Theorem 1. 


Before passing to the case m > 0 we complete the above discussion by de- 
fining the functional ¢. Let 


and define 


Y(X) ia, °°° a aw) Y (say). 


(More detailed definitions in |7]; ¥ is essentially the chance sequence received 
when the chance sequence X is sent.) Define the symbol 


P{Y = y|X = 2} (P{X =2/ ¥Y = y}) 


as the conditional probability that Y = y, given Y = a (that X = az, given 
Y = y). We define the following functions of the chance variables X and Y: 
When X = zand Y = y, then 


Junction equals 
Pix] 
Ae 
Pix 
PUY 


® K, depends on the channel probability function 
‘ There is some ambiguity about the theorem actually stated 
5K, depends on the channel probability funetion 
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Let EF denote the expected value operator. It is proved that the following limits 


all exist 


E\log P{X} dD, 


‘log 


Also it is true and e isy to prove 


D, + D, D. + D 


¢ Db, -— D D. Ds . 


We now turn to th rere ral case m = 0. For this case Theorem 4 of {8} gives a 
general upper bound for the length of an (error correcting) code. When m 0) 
Theorem 4 specializes to Theorem 2. Whether Theorem 4 gives the “best” upper 
bound (as Theorem 2 does for m = 0) is still under investigation. Unfortunately, 
to state Theorem 4 one needs a page of preliminary definitions and then the 
theorem is stated in terms which require the reader to be familiar with the theory 
of Markov chains. (However, the application of the theorem as described in the 
discussion of [8] which follows its proof is little more difficult than that of Theorem 
2). For these reasons it seems best to refer the interested reader to [8] 


Postscript added in December, 1957. 


Since this paper was submitted for publication the author has obtained the 
following result: A number J is defined by means of certain algebraic and analytic 


operations on the channel probability function which we shall not describe here. 


For anv positive « and n sufficiently large, there exists a code of length 2 
e s a) 


and there cannot exist a code of length greater than 2" ~". This result can be 
approximately described by saying that 2 is the maximum achievable code 


length. 
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SOME PROBLEMS CONNECTED WITH STATISTICAL INFERENCE 
By D. R. Cox 


Birkbeck College, University of London! 


1. Introduction. This paper is based on an invited address given to a joint 
meeting of the Institute of Mathematical Statistics and the Biometric Society 
at Princeton, N. J., 20th April, 1956. It consists of some general comments, few 
of them new, about statistical inference. 

Since the address was given publications by Fisher [11], [12], [13], have pro- 
duced a spirited discussion [7], [21], [24], [31] on the general nature of statistical 
methods. I have not attempted to revise the paper so as to comment point by 
point on the specific issues raised in this controversy, although I have, of course, 
checked that the literature of the controversy does not lead me to change the 
opinions expressed in the final form of the paper. Parts of the paper are con- 
troversial; these are not put forward in any dogmatic spirit. 


2. Inferences and decisions. A statistical inference will be defined for the 
purposes of the present paper to be astatement about statistical populations made 
from given observations with measured uncertainty. An inference in general is 
an uncertain conclusion. Two things mark out statistical inferences. First, 
the information on which they are based is statistical, i.e. consists of observations 
subject to random fluctuations. Secondly, we explicitly recognise that our con- 
clusion is uncertain, and attempt to measure, as objectively as possible, the un- 
certainty involved. Fisher uses the expression ‘the rigorous measurement of 
uncertainty’. 

A statistical inference carries us from observations to conclusions about the 
populations sampled. A scientific inference in the broader sense is usually con- 
cerned with arguing from descriptive facts about populations to some deeper 
understanding of the system under investigation. Of course, the more the statisti- 
cal inference helps us with this latter process, the better. For example, consider 
an experiment on the effect of various treatments on the macroscopic properties 
of a polymer. The statistical inference is concerned with what can be inferred 
from the experimental results about the true treatment effects. The scientific 
inference might concern the implications of these effects for the molecular 
structure of the polymer; the statistical uncertainty is only a part, sometimes 
small, of the uncertainty of the final inference. 

Statistical inferences, in the sense meant here, involve the data, a specification 
of the set of possible populations sampled and a question concerning the true 
populations. No consideration of losses is usually involved directly in the in- 
ference, although these may affect the question asked. If the population sampled 

Received October 7, 1957; revised February 10, 1958. 


1 Work done at the Department of Biostatistics, School of Public Health, University of 
North Carolina. 
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has itself been selected by a random procedure with known prior probabilities, 
it seems to be generally agreed that inference should be made using Bayes’s 
theorem. Otherwise, prior information concerning the parameter of direct 
interest? will not be involved in a statistical inference. The place of prior in- 
formation is discussed some more when we come to talk about decisions, but the 
general point is that prior information that is not statistical cannot be included 
without abandoning the frequency theory of probability, and information that 
is derived from other statistical data can be handled by methods for the com- 
bination of data. 

The theory of statistical decision deals with the action to take on the basis of 
statistical information. Decisions are based on not only the considerations listed 
for inferences, but also on an assessment of the losses resulting from wrong 
decisions, and on prior information, as well as, of course, on a specification of the 
set of possible decisions. Current theories of decision do not give a direct measure 
of the uncertainty involved in making the decision; as explained above, a sta- 
tistical inference is regarded here as having an explicitly measured uncertainty, 
and this is to be thought of as an essential distinction between statistical de- 
cisions and statistical inferences. 

Thus, significance tests and confidence intervals, if looked at in the way ex- 
plained below, are inference procedures. Discriminant analysis, considered as a 
method for classifying individuals into one of two groups, is a decision pro- 
cedure; considered as a tool for assigning a score to an individual to say how 
reasonable it is that the individual comes from one group rather than the other, 
it is an inference procedure. Strict point estimation represents a decision; esti- 
mation by point estimate and standard error is a condensed and approximate 
form of interval estimation and is an inference procedure. Estimation by a 
posterior distribution derived from an agreed prior distribution is an inference 
procedure. A test of a hypothesis, considered in the literal Neyman-Pearson 
sense as a rule for taking one of two decisions concerning a statistical hy- 
pothesis, is a decision procedure, in which prior knowledge and losses enter im- 
plicitly. The reader may find it helpful to consider the extent to which the specifi- 
‘ation, implicitly or explicitly, of losses and prior knowledge is essential for 
solution of the problems just listed as ones of decision. 

For example, consider the analysis of an experiment to compare two in- 
dustrial processes, A and B. The statistical inference might be that, under cer- 
tain assumptions about the populations, process A gives a yield higher than that 
of process B, the difference being statistically significant past the 1/1000 level, 
90, 95 and 99 per cent confidence intervals for the amount of the true difference 
being such and such. The decision might be that having regard to the differences 
in yield of practical importance, and our prior knowledge, we will consider that 
the experiment has established, under the conditions examined, that process A 
has « higher yield than B and will take future action accordingly. 


2 7.e. relevant information about the parameter of interest, other than that contained 
in the data and in the specification of the set of possible parameter values 
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An inference without a prior distribution can be considered as answering the 
question: ‘What do these data entitle us to say about a particular aspect of the 
populations that interest us?’ It is, however, irrational to take action, scientific 
or technological, without considering both all available relevant information, 
including for example the prior reasonableness of different explanations of a set 
of data, and also the consequences of doing the wrong thing. Why then, do we 
bother with inferences which go, as it were, only part of the way towards the final 
decision? 

Even in problems where a clear-cut decision is the main object, it very often 
happens that the assessment of losses and prior information is subjective, so 
that it will help to get clear first the relatively objective matter of what the 
data say, before embarking on the more controversial issues. In particular, it 
may happen either that the data are little aid in deciding the point at issue, or 
that the data suggest one conclusion so strongly that the only people in doubt 
about what to do are those with prior beliefs, or opinions about losses, heavily 
biased in one direction. In some fields, too, it may be argued that one of the main 
calls for probabilistic statistical methods arises from the need to have agreed 
rules for assessing strength of evidence. 

A full discussion of this distinction between inferences and decisions will not 
be attempted here. Three more points are, however, worth making briefly. 
First, some people have suggested that what is here called inference should be 
considered as ‘summarization of data’. This choice of words seems not to recog- 
nise that an essential element is the uncertainty involved in passing from the 
observations to the underlying populations.’ Secondly, the distinction drawn 
here is between the applied problem of inference and the applied problem of 
decision-making; it is possible that a satisfactory set of techniques for inference 
could be constructed from a mathematical structure very similar to that used in 
decision theory. 

Finally, it might be argued that in making an inference we are ‘deciding’ 
to make a statement of a certain type about the populations and that therefore, 
provided that the word decision is not interpreted too narrowly, the study of 
statistical decisions embraces that of inference. The point here is that one of the 
main general problems of statistical inference consists in deciding what types of 
statement can usefully be made and exactly what they mean. In statistical de- 
cision theory, on the other hand, the possible decisions are considered as already 
specified. 


3. The sample space. Statistical methods work by referring the observations 
S to a sample space = of observations that might have been obtained. Over = 
one or more probability measures are defined and calculations in these probability 
distributions give our significance limits, confidence intervals, etc. = is usually 
taken to be the set of all possible samples having the same size and structure 
as the observations. 








3 A referee has suggested the term ‘summarization of evidence,’ which seems a good one 
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Fisher (see, for example, [11]) and Barnard [4] have pointed out that = may 
have no direct counterpart in indefinite repetition of the experiment. For ex- 
ample, if the experiment were repeated, it may be that the sample size would 
change. Therefore what happens when the experiment is repeated is not suffi- 
cient to determine 2, and the correct choice of = may need careful consideration. 

As a comment on this point, it may be helpful to see an example where the 
sample size is fixed, where a definite space 2 is determined by repetition of the 
experiment and yet where probability calculations over = do not seem relevant 
to statistical inference. 

Suppose that we are interested in the mean @ of a normal population and that, 
by an objective randomization device, we draw either (i) with probability }, 
one observation, x, from a normal population of mean @ and variance aj or (ii) 
with probability 3, one observation z, from a normal population of mean @ and 
variance o2, where oj, ¢2 are known, oj >> o2 and where we know in any particular 
instance which population has been sampled. 

More realistic examples can be given, for instance in terms of regression prob- 
lems in which the frequency distribution of the independent variable is known. 
However, the present example illustrates the point at issue in the simplest terms. 
(A similar example has been discussed from a rather different point of view in 
[6], [29]). 

The sample space formed by indefinite repetition of the experiment is clearly 
defined and consists of two real lines =, , 22, each having probability 3, and 
conditionally on 2; there is a normal distribution of mean @ and variance oj . 

Now suppose that we ask, accepting for the moment the conventional formu- 
lation, for a test of the null hypothesis @ = 0, with size say 0.05, and with maxi- 
mum power against the alternative 6’, where & ~ a, > o2. 

Consider two tests. First, there is what we may call the conditional test, in 
which calculations of power and size are made conditionally within the particular 
distribution that is known to have been sampled. This leads to the critical 
regions x > 1.64 o; or x > 1.64 o2, depending on which distribution has been 
sampled. 

This is not, however, the most powerful procedure over the whole sample space. 
An application of the Neyman-Pearson lemma shows that the best test depends 
slightly on 6’, a1 , o2 , but is very nearly of the following form. Take as the critical 
region 


x > 1.280, if the first population has been sampled; 


x > 5e2, if the second population has been sampled. 


Qualitatively, we can achieve almost complete discrimination between 6 = 0 
and 6 = 6’ when our observation is from 22, and therefore we can allow the 
error rate to rise to very nearly 10% under ; . It is intuitively clear, and can 
easily be verified by calculation, that this increases the power, in the region of 
interest, as compared with the conditional test. 

Now if the object of the analysis is to make statements by a rule with certain 
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specified long-run properties, the unconditional test just given is in order, 
although it may be doubted whether the specification of desired properties is in 
this case very sensible. If, however, our object is to say ‘what we can learn from 
the data that we have’, the unconditional test is surely no good. Suppose that 
we know we have an observation from 2, . The unconditional test says that we 
can assign this a higher level of significance than we ordinarily do, because if 
we were to repeat the experiment, we might sample some quite different distri- 
bution. But this fact seems irrelevant to the interpretation of an observation 
which we know came from a distribution with variance oj . That is, our calcula- 
tions of power, etc. should be made conditionally within the distribution known 
to have have been sampled, i.e. if we are using tests of the conventional type, 
the conditional test should be chosen. 

To sum up, if we are to use statistical inferences of the conventional type, the 
sample space = must not be determined solely by considerations of power, or by 
what would happen if the experiment were repeated indefinitely. If difficulties 
of the sort just explained are to be avoided, = should be taken to consist, so far 
as is possible, of observations similar to the observed set S, in all respects which 
do not give a basis for discrimination between the possible values of the unknown 
parameter @ of interest. Thus, in the example, information as to whether it was 
>; or , that we sampled tells us nothing about @, and hence we make our in- 
ference conditionally on 2; or 22. 

Fisher has formalized this notion in his concept of ancillary statistics [10], 
[23], [27]. His definitions deal with the situation without nuisance parameters 
and before outlining an extension that attempts to cope with nuisance pa- 
rameters, it is convenient tostatea slight modification of the original definitions. 
Let m be a minimal set of sufficient statistics’ for the unknown parameter of 
interest, 6, and suppose that m can be written (t, a), where the distribution of a 
is independent of 6, and that no further components can be extracted from t 
and incorporated in a. That is, we divide, if possible, the space of m into sets 
each similar to the sample space, and take the finest such division, assumed here 
to be unique subject to regularity conditions. Then a is called an ancillary statistic 
and we agree to make inferences conditionally on the observed a. 

EXAMPLES. (i) In the examp’ of section 3, a minimal set consists of the 
observation, x, and an indicator variable to show which population has been 
sampled. The latter satisfies the conditions for being an ancillary statistic. Pro- 
vided that the possible values of the mean @ include an interval, there is no set 
of x values with the same probability for all 6. 

(ii) Under the ordinary assumptions of normal linear regression theory, plus 
the assumption that the independent variable has any known distribution (with- 
out unknown parameters), the values of the independent variable form an 
ancillary statistic. 

(iii) The following example is derived from one put forward by a referee. 


4 The terms used by Fisher are that a minimal set of sufficient statistics with more com- 
ponents than there are parameters is called exhaustive and a minimal set with the same 


number of components as there are parameters is called sufficient 
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Let x be a single observation with density 1 + 26z,-4S2x5}34,-1S¢@081. 
Then we can write x = [sgn 2, |x|] and |x| has the same density for all 6. Hence 
we argue conditionally on the observed value of |x|. For example in testing 
6 = 0 against @ > 0, the possible P values (see section 5) are 1 and 3. This may 
seem a curious result but is, I think, reasonable if one regards a significance 
test as concerned with the extent to which the data are consistent with the null 
hypothesis. 

Suppose now that there are nuisance parameters o. Let m be a minimal set 
of sufficient statistics for estimating (6, @) and suppose that m can be partitioned 
into [t, s, a] in such a way that 

(i) functions of t and 6, so-called pivotal quantities, exist with a distribution 
conditionally on a that is independent of }. If any component of s is added to 
t or a, this independence from ¢@ no longer holds. Further, no components can be 
extracted from t and incorporated in a; 

(ii) the values of a and s give no direct information about 6 in the sense to be 
defined below. Then we agree to make inferences about @ from the conditional 
distribution of (i). 

We need then to define what is meant by saying that a quantity y gives no 
direct information about 6, when nuisance parameters @ are present. One con- 
dition that might be considered is that the density p(y; 8, @) should be inde- 
pendent of 6. This seems too strong, as does also the requirement that for every 
different pair 6; , 6 and for every y, p(y; 8: , &) / p(y; %& , @) should run through 
all positive real values as varies. An appropriate condition seems to be that 
given admissible values y, 6; , 6 , o, there exist admissible 6, $; , }2, such that 


(1) PCy; 6:,) _ ply; 9, o1) 
ply; 62, ©) ply; 6, 2) 


The import of the condition is that any contemplated distinction between two 
values of 6 might just as well be regarded as a distinction between two values 
of . 

For example, suppose that x is a single observation from a normal distribution 
of unknown mean ¢ and variance 6. Then z gives no direct information about 6 
in the sense of (1), provided that ¢ is completely unknown. Another example is 
normal regression theory with the independent variable having an arbitrary 
unknown distribution, not involving the regression parameters of interest [10]. 
Here a is the set of values of the independent variable and s is the sum squares 
about the regression line, assuming that the residual variance about the re- 
gression line, ¢, is a nuisance parameter. 

For a third example, let r; , r2 be randomly drawn from Poisson distributions 
of means y1, we and let uw. /u: = 6 be the parameter of interest; that is write 
the means as ¢, $0, where ¢ is a nuisance parameter. The likelihood of 7, 72 
can be written 


e 8 OT (1 + a)" ve a! ( 1 y( 6 y 
a! ~ “ta —d)1\1 +07 \1+ 0/7 ’ 
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where ¢ = ry, @ = r; + rz and with s null. The equation (1) is satisfied, telling 
us that a gives us no direct information about 6. Therefore significance and 
confidence calculations are to be made conditionally on the observed value of a, 
as is the conventional procedure [25]. 

To apply the definitions we have to regard our observations as generated by a 
random process; the idea of ancillary statistics simply tells us how to cut down 
the sample space to those points relevant to the interpretation of the observations 
we have. 

In the problems without nuisance parameters, it is known that methods of 
inference [5], that use only observed values of likelihood ratios, and not tail 
areas, avoid the difficulties discussed above, since the likelihood ratio is the 
same whether we argue conditionally or not. Lindley, using concepts from [18], 
has recently shown that for a broad class of problems with nuisance parameters, 
the conditional methods are optimum in the Neyman-Pearson sense. 

Another important problem connected with the choice of the sample space, 
not discussed here, concerns the possibility and desirability of making inferences 
within finite sample spaces obtained by permuting the observations; see, for 
example, [16]. 


4. Interval estimation. Much controversy has centred on the distinction 
between fiducial and confidence estimation. Here follow five remarks, not about 
the mathematics, but about the general aims of the two methods. 

(i) The fiducial approach leads to a distribution for the unknown parameter, 
whereas the method of confidence intervals, as usually formulated, gives only 
one interval at some preselected level of probability. This seems at first sight a 
distinct point in favour of the fiducial method. For when we write down the 
confidence interval (Z — 1.96 o/+/n, = + 1.96 ¢/+/n) for a completely unknown 
normal mean, there is certainly a sense in which the unknown mean @ is likely to 
lie near the centre of the intervai, and rather unlikely to lie near the ends and 
in which, in this case, even if @ does lie outside the interval, it is probably not 
far outside. The usual theory of confidence intervals gives no direct expression 
of these facts. 

Yet this seems to a large extent a matter of presentation; in the common 
simple cases, where the upper a limit for @ is monotone in a, there seems no 
reason why we should not work with confidence distributions for the unknown 
parameter. These can either be defined directly, or can be introduced in terms 
of the set of all confidence intervals at different levels of probability. Statements 
made on the basis of this distribution, provided we are careful about their form, 
have a direct frequency interpretation. In applications it will often be enough 
to specify the confidence distribution, by for example a pair of intervals, and 
this corresponds to the common practice of quoting say both the 95 per cent 
and the 99 per cent confidence intervals. 

It is not clear what can be done in those complex cases [8], [26], where say 
the upper 5 per cent limit for @ is larger than the upper 1 per cent limit, or 
indeed whether confidence interval estimation is at all satisfactory in such cases. 
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Within the class of distributions with monotone likelihood ratio [15], such 
difficulties will, however, be avoided. 

If we consider that the object of interval estimation is to give a rule for making 
on the basis of each set of data, a statement about the unknown parameter, a 
certain preassigned proportion of the statements to be correct in the long run, 
consideration of the confidence distribution may seem unnecessary and possibly 
invalid. The attitude taken here is that the object is to attach, on the basis of 
data S, a measure of uncertainty to different possible values of 6, showing what 
can be inferred about @ from the data. The frequency interpretation of the 
confidence intervals is the way by which the measure of uncertainty is given a 
concrete interpretation, rather than the direct object of the inference. From this 
point of view it is difficult to see an objection to the consideration of many 
confidence statements simultaneously. 

If the whole set of intervals is regarded as the fundamental concept, and if 
we are interested both in upper and in lower limits for 6, we may conveniently 
specify the set by giving say the upper and lower 2} % points, ete., it being a 
useful convention, and no more, that the 95% interval so obtained should have 
equal probabilities associated with each tail. The elaborate discussion that is 
sometimes necessary in the conventional theory to decide which particular 
combination of upper and lower tail areas is best to get a 95% interval seems, 
from this point of view, irrelevant. 

(ii) It is sometimes claimed as an advantage of fiducial estimation that it is 
restricted to methods that use ‘all the information in the data’, while confidence 
estimation includes any method giving the requisite frequency interpretation. 
This claim is lent some support by those accounts of confidence interval theory 
which use the words ‘valid’ or ‘exact’ for a method of calculating intervals 
that has, under a given mathematical set-up, an exact frequency interpretation, 
no matter how inadequate the intervals may be in telling us what can be learnt 
from the data. 

However, good accounts of the theory of confidence intervals stress equally 
the need to cover the true value with the required probability and the require- 
ment of having the intervals as narrow as possible in a suitable sense [21]. Very 
special importance, therefore, attaches to intervals based on exhaustive esti- 
mates. It is true that there are differences between the approaches in that the 
fiducial method takes the use of exhaustive estimates as a primary requirement, 
whereas in the theory of confidence intervals the use of exhaustive estimates is 
deduced from some other condition. This does not seem however to amount to 
a major difference between the methods. 

(iii) The uniqueness of inferences obtained by the fiducial method has re- 
ceived much discussion recently, [9], [20], [28]. Uniqueness is important be- 
cause, once the mathematical form of the populations is sufficiently well specified, 
it should be possible to give a single answer of a given type to the question 
‘what do the data tell us about 6?’. 
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The present position is that several cases are known where the fiducial method 
leads to non-unique answers, although it is, of course, entirely possible that a 
way will be found of formulating fiducial calculations to make them unique. A 
comparison with confidence intervals is difficult here, because in many of the 
multi-parameter problems, the single parameters for which confidence estima- 
tion is known to be possible at all are very limited. No cases of non-unique 
optimum confidence intervals seem to have been published. 

(iv) If sufficient estimation, in Fisher’s sense, is possible for a group of pa- 
rameters, fiducial inference will usually be possible about any one of them or 
any combination of them, since the joint fiducial distribution of all the pa- 
rameters can be found and the unwanted parameters integrated out. Exact 
confidence estimation is in general possible only for restricted combinations of 
parameters. An example is the Behrens-Fisher problem, where exact fiducial 
inference is possible. The situation about confidence estimation in this case is 
far from clear, but may be that the asymptotic expansion proposed by Welch 
{30}, while giving a close approximation to an ‘exact’ system of confidence inter- 
vals, has frequency properties depending slightly on the nuisance parameters. 
Nothing seems to be known about possible optimum properties in the Neyman- 
Pearson sense. In the language of testing hypotheses, Welch’s procedure is to 
look for a region of constant size a, independently of the nuisance parameters. 
It is conceivable that greater power against some alternatives is attained by 
having a size only bounded by a; indeed, this is made plausible by [12]. 

(v) The final consideration concerns the question of frequency verification. 
Fisher has repeatedly stated that the immediate object of fiducial inference is 
not the making of statements that will be correct with given frequency in the 
long run. One may readily accept this in that one really wants to measure the 
uncertainty corresponding to different ranges of values for 6, and it is quite 
conceivable that one could construct a satisfactory measure of uncertainty that 
has not a direct frequency interpretation. Yet one must surely insist on some 
pretty clear-cut practical meaning to the measure of uncertainty and this 
fiducial probability has never been shown to have, except in those cases where 
it is equivalent to confidence interval estimation. J. W. Tukey’s [25] recent 
unpublished work on fiducial probability and its frequency verification may be 
mentioned here. 

A different justification of fiducial distributions that is sometimes advanced 
is to derive them from Bayes’s theorem, using a conventional form of prior 
distribution. To remain within the framework of the frequency theory of proba- 
bility, it would then be necessary to distinguish between proper frequency dis- 
tributions and hypothetical ones. The physical interpretation of the measure 
of uncertainty of statements about @ is that if @ had such and such a prior fre- 
quency distribution, then the posterior frequency distribution of @ would be 
such and such. This all amounts to a reinterpretation of Jeffreys’s theory [17]. 
An important advantage of this approach is that it ensures independence from 
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the sampling rule (see [2]) and from the difficulties of section 3. On the other 


hand it seems a clumsy way of dealing with simple one-parameter problems, 
especially when the choice of prior distribution is difficult. 


If the above considerations are accepted, it seems reasonable to base interval 
estimation on a slightly revised form of the theory of confidence intervals. 

Estimation by confidence or fiducial distribution may be contrasted with the 
proposal [5], [13] to plot the likelihood of the unknown parameter @ in the light 
of the data, standardized by the maximum likelihood over 6. Advantages of the 
latter method are mathematical simplicity and independence from the sampling 
rule. Disadvantages are that it is not clear how to deal with nuisance parameters, 
that it is not clear that division by the maximum value of the likelihood makes 
values in different situations genuinely comparable, and that there is some 
difficulty in giving practical interpretation to the ratios so obtained. It might 
be argued that this last difficulty arises solely from lack of familiarity with the 
method. 


5. Significance tests. Suppose now that we have a null hypothesis H, 
concerning the population or populations from which the data S were drawn 
and that we enquire ‘what do the data tell us concerning the possible truth or 
falsity of Hy?’ Adopt as a measure of consistency with the null hypothesis 


sc 


at data showing as much or more] ,, | 
(2) prob< smite, Sie - | Ho? . 
\evidence against Ho as S J 

That is, we calculate, at least approximately, the actual level of significance 
attained by the data under analysis and use this as a measure of conformity 
with the null hypothesis. The value obtained in this way is often, particularly 
in the biological literature, called the P-value. Significance tests are often used 
in practice like this, although many formal accounts of the theory of tests sug- 
gest, implicitly or explicitly, quite a different procedure. Namely, we should, 
after considering the consequences of wrongly accepting and rejecting the null 
hypothesis, and the prior knowledge about the situation, fix a significance level 
in advance of the data. This is then used to form a rigid dividing line between 
samples for which we accept the null hypothesis and those for which we reject 
the null hypothesis. A decision-type of this sort is clearly something quite 
different from the application just contemplated. 

Two aspects of significance tests will be discussed briefly here. First there is 
the question of when significance tests are useful and secondly there is the 
justification of (2) as a measure of conformity. 

We shall for simplicity, consider situations in which the possible populations 
correspond to values of a continuously varying parameter @, the null hypothesis 
being say 6 = 0). There may be nuisance parameters. 

A practical distinction can be made between cases in which the null value 6 
is considered because it divides the parameter range into qualitatively different 
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sections and those cases in which it is thought that there is a reasonable prospect 
that the null value is very nearly the true one. For example, in the comparison 
of two alternative industrial processes we might quite often have no particular 
expectation that the treatment difference is small. In such cases the significance 
test is concerned with whether we can, from the data under analysis, claim the 
existence of a difference in the same direction as that observed. Or, to look at 
the matter slightly differently, the significance level tells us at what levels the 
confidence intervals for the true difference include only values with the same 
sign as the sample difference. This idea that the significance level is concerned 
with the possibility that the true effect may be in the opposite direction from 
that observed, occurs in a different way in [17]. 

The answer to the significance test is rarely the only thing we should consider: 
whether or not significance is attained at an interesting level (say at the 10% 
level or better), some consideration should be given to whether differences that 
may exist are of practical importance, i.e. estimation should be considered as 
well as significance testing. A likely exception to this is in the analysis of rather 
limited amounts of data, where it can be taken for granted that differences of 
practical importance are consistent with the data. The point of the statistical 
analysis is in such cases to see whether the direction of any effects has been 
reasonably well established, i.e. whether a qualitative conclusion about the 
effects has been demonstrated. 

The problem dealt with by a significance test, as just considered, is different 
from that of deciding which of two treatments is to be recommended for future 
use or further investigation. This cannot be tackled without consideration of 
the differences of practical importance, the losses consequent on wrong decisions 
and the prior knowledge. Depending on these and on sample size, the level of P 
for practical action may vary widely. 

The second type of application of significance tests is to situations where 
there is a definite possibility that the null hypothesis is nearly true. (Exact 
truth of a null hypothesis is very unlikely except in a genuine uniformity trial). 
A full analysis of such a situation would involve consideration of what departure 
from the null hypothesis is considered of practical importance. However, it is 
often convenient to test the null hypothesis directly; if significant departure 
from it is obtained, consideration must then be given to whether the departure 
is of practical importance. Of course, in any case we will probably wish to 
examine the problem as one of estimation as well as of significance testing, asking 
for example, for the maximum true difference consistent with the data. 

Consider now the choice of (2) as the quantity to measure significance. To 
use the definition, we need to order the points of the sample space in terms of 
the evidence they provide against the null hypothesis. 

The most satisfactory way is the introduction, as in the usual development 
of the Neyman-Pearson theory, of the requirement of maximum sensitivity in 
the detection of certain types of departure from the null hypothesis. That is, 
we wish, in the simplest case, to maximise, if possible for all fixed e, 
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probe(attaining significance at the e level), 


where 6 represents a set-up which we desire to distinguish from the null hy- 
pothesis. That is we choose the procedure that makes the random variable (2) 
as stochastically small as possible when the alternative hypotheses are true. 
This leads in simple cases, to a unique specification of the significance proba- 
bility (2). 

In the simple case when there is a single alternative hypothesis, it seems at 
least of theoretical interest to distinguish between the problem of discrimina- 
tion and that of significance testing. In discrimination, the two populations are 
on an equal footing and there are strong arguments for considering that only 
the observed value of the likelihood ratio is relevant. The question asked is 
‘which of these populations do the observations come from?’ In significance 
testing the question is ‘are the data consistent with having come from Hy ?’ The 
alternative hypothesis serves merely to mark out the sample points giving evi- 
dence against Hp. 

The next question to consider is why we sum over a whole set of sample 
points rather than work in terms only of the observed point. This has been 
much discussed. The advantage of (2) is that it has a clear-cut physical inter- 
pretation in terms of the formal scheme of acceptance and rejection contem- 
plated in the Neyman-Pearson theory. To obtain a measure depending only on 
the observed sample point, one way is to take the likelihood ratio, for the ob- 
served point, of the null hypothesis versus some conventionally chosen alterna- 
tive (see [5]), and while a practical meaning can be given to this, it has less 
direct appeal. But consider a test of the following discrete null hypotheses: 


Sample value prob. under Ho prob. under Ho 
0 0.80 0.75 
1 0.15 0.15 
2 0.05 0.05 
3 0.00 0.04 
4 0.00 0.01 


and suppose that the alternatives are the same in both cases and are such that 
the probabilities (2) should be calculated by summing the probabilities of values 
as great or greater than that observed. Suppose further that the observation 2 
is obtained; under H, the significance level is 0.05, while under Ho it is 0.10. 
Yet it is difficult to see why we should say that our observation is more con- 
sistent with Hy than with Ho; this point has often been made before [4], [16]. 
On the other hand, if we are really interested in the confidence interval type 
of problem, i.e. in covering ourselves against the possibility that the ‘effect’ is 
in the direction opposite to that observed, the use of the tail area seems more 
reasonable. As noted in section 3 the use of likelihood ratios rather than summed 
probabilities avoid difficulties connected with the choice of the sample space, 2. 
We are faced with a conflict between the mathematical and logical advantages 
of the likelihood ratio, and the desire to calculate quantities with a clear prac 
tical meaning in terms of what happens when they are calculated. 
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In general the role that tail areas ought to play in statistical inference is far 
from clear and further discussion is very desirable. The reader may refer to [1] 
and [19]. 

In this and the preceding section the problems of interval estimation and 
significance testing have been considered. There is not space to give a parallel 
discussion of the other types of statistical procedure. 


6. The role of the assumptions. The most important general matter con- 
nected with inference not discussed so far, concerns the role of the assumptions 
made in calculating significance, etc. Only a very brief account of this matter 
will be given here. 

Assumptions that we make, such as those concerning the form of the popula- 
tions sampled, are always untrue, in the sense that, for example, enough obser- 
vations from a population would surely show some systematic departure from 
say the normal form. There are two devices available for mitigating this difficulty, 
namely 

(i) the idea of nuisance parameters, i.e. of inserting sufficient unknown 
parameters into the functional form of the population, so that a better approxi- 
mation to the true population can be attained; 

(ii) the idea of robustness (or stability), ie. that we may be able to show 
that the answer to the significance test or estimation procedure would have been 
essentially unchanged had we started from a somewhat different population 
form. Or, to put it more directly, we may attempt to say how far the population 
would have to depart from the assumed form, to change the final conclusions 
seriously. This leaves us with a statement that has to be interpreted qualita- 
tively in the light of prior information about distributional shape, plus the 
information, if any, to be gained from the sample itself. This procedure is fre- 
quently used in practical work, although rarely made explicit. 

In inference for a single population mean, examples of (i) are, in order of 
complexity, to assume 

(a) a normal population of unknown dispersion; 

(b) a population given by the first two terms of an Edgeworth expansion; 

(c) in the limit, either an arbitrary population, or an arbitrary continuous 

population (leading to a distribution-free procedure). 

The last procedure has obvious attractions, but it should be noted that it is 
not possible to give a firm basis for choice between numerous alternative methods, 
without bringing in strong assumptions about the power properties required, 
and also that it often happens that no reasonable distribution-free method exists 
for the problem of interest. Thus if we are concerned with the mean of a popu- 
lation of unknown shape and dispersion, no distribution-free method is available 
[3]; when the property measured is extensive, the mean is often the uniquely 
appropriate parameter. 

A rather artificial example of method (ii) is that if we were given a single 
observation from a normal population and asked to assess the significance of the 
difference from zero, we could plot the level attained against the population 








370 D. R. COX 


standard deviation o. Then we could interpret this qualitatively in the light of 
whatever prior information about o was available. A less artificial example con- 
cerns the comparison of two sample variances. The ratio might be shown to be 
highly significant by the usual F test and a rough calculation made to show that 
provided that neither 8, exceeded 8, significance at least say at the 1 per cent 
level would still occur. 

In practical situations we usually employ a mixture of (i) and (ii) depending 
on 

(a) the extent to which our prior knowledge limits the population form in 

respects other than those of direct interest; 

(b) the amount of information in the data about the population character- 

istic that may be used as a nuisance parameter; 

(c) the extent to which the final conclusion is sensitive to the particular 

population characteristic of interest. 

Thus, in (a) if we have a good idea of the population form, we are probably 
not much interested in the fact that a distribution-free method has certain de- 
sirable properties for distributions quite unlike that we expect to encounter. To 
comment on (b), we would probably not wish to studentize with respect to a 
minor population characteristic about which hardly any information was con- 
tained in the sample, e.g. an estimate of variance with one or two degrees of 
freedom. In small sample problems there is frequently little information about 
population shape contained in the data. Finally, there is consideration (c). If 
the final conclusion is very stable under changes of distribution form, it is 
usually convenient to take the most appropriate simple theoretical form as a 
basis for the analysis and to use method (ii). 

Now it is very probable that in many instances investigation would show 
that the same answer would, for practical purposes, result from the alternative 
types of method we have been discussing. But suppose that in a particular 
instance there is disagreement, e.g. that the result of applying a ¢ test differs 
materially from that of applying some distribution-free procedure. What should 
we do? 

It can be argued that, even if we have no good reason for expecting a normal 
population, we should not be willing to accept the distribution-free answer un- 
conditionally. A serious difference between the results of the two tests would 
indicate that the conclusion we draw about the population mean depends on the 
population shape in an important way, e.g. depends on the attitude we take to 
certain outlying observations in the sample. It seems more satisfactory for a full 
discussion of the data, to state this and to assemble whatever evidence is avail- 
able about distributional form, rather than simply to use the distribution-free 
approach. Distribution-free methods are, however, often very useful in small 
sample situations where little is known about population form and where elabo- 
rate treatment of the results would be out of place. 


An interesting discussion of the role of assumptions in decision theory is given 
in [14]. 
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I am much indebted to the two referees for detailed and constructive criticism 
of the paper. 
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ASYMPTOTIC DISTRIBUTION OF STOCHASTIC 
APPROXIMATION PROCEDURES' 


By JEROME Sacks? 


California Institute of Technology 


1. Introduction. Beginning with the paper of Robbins and Monro [11] much 
work has been done in stochastic approximation. The Robbins-Monro procedure 
(see [11] or Section 3 below) for finding the root of a regression equation and the 
Kiefer-Wolfowitz procedure (see [9] or Section 4 below) for finding the maximum 
of a regression function have been the chief objects of investigation. The in- 
vestigations that have been carried out on these procedures have been along two 
lines: the first being concerned with conditions under which the procedures 

i.e., the sequence {X,} of approximating random variables) converge, in some 
sense, and the second being concerned with the speed of convergence and the 
asymptotic distribution of the procedures. For details concerning these investi- 
gations we refer the reader to the literature; some account of them may be found 
in Sections 3, 4, and 5 when they relate to the context. In particular the results 
relating to conditions for convergence are all subsumed in the work of Dvoretzky 
[7], Wolfowitz [12], and Block [1]. 

Chung [5] was the first to give any results about the asymptotic distribution 
of these procedures in his treatment of the Robbins-Monro procedure, and his 
methods (see the next paragraph) have been the basis for all work done hereto- 
fore in this direction. Hodges and Lehmann [8] improved some of Chung’s results. 
Derman [6] used Chung’s methods to obtain some results for the Kiefer-W olfowitz 
procedure and Burkholder [4] extended Chung’s methods to obtain further re- 
sults on the asymptotic distribution of the Kiefer-Wolfowitz procedure. 

Chung’s method for obtaining his results on the asymptotic normality of 
the appropriately normalized sequence {X,} is to compute sufficiently fine 
estimates for the moments of X, — 6 (6 is the root of the regression equation) 
and then to apply the method of moments. As we noted above all previous work 
on the asymptotic distribution of the two procedures in question has been based 
on Chung’s methods. The main feature of the present work is that we do away 
with the method of moments by, instead, utilizing a central limit theorem for 
dependent random variables and obtain more general and more complete results 
about the asymptotic normality of {X,} for both procedures by using a different 
method of proof—the method of proof we use may be seen by referring to that 
portion of the proof of Theorem 1 which lies between (3.8) and (3.9c). In ad- 
dition, in Examples 1 and 2 in Section 4 we show that some of the results obtained 
here are best possible in a certain sense. 
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One of the complications arising from use of the method of moments is that the 
computations needed there are not feasible unless {a,} and {c,} (see (3.0) and 
(4.0)) are of the special type a, = an‘, c, =cn’. While we take a, an 
in Sections 3, 4, and 5 (see Section 6 for further remarks on this choice of a,) one 
of the reasons why we can obtain better results for the Kiefer-Wolfowitz pro- 
cedure than heretofore obtained is that the method of proof we use permits a 
wider choice for {c,}. Other desirable features of the method of proof presented 
here are that restrictions are needed only on the second moments of Z(x) (the 
method of moments requires restrictions on all moments) and that the method 
can be used without difficulty on some multi-dimensional analogues (see Section 
5) of the procedures. 

In Section 3 we treat the Robbins-Monre procedure and in Section 4 we discuss 
the Kiefer-Wolfowitz procedure. Section 5 is devoted to some multi-dimensional 
analogues of the procedures. Section 6 discusses some further consequences and 
extensions of the results of earler sections. In Section 2 we collect some lemmas 
and computations which are used repeatedly in later sections. 

The author would like to take this opportunity to acknowledge his debt to 
Professors J. Kiefer and J. Wolfowitz for their direction and assistance during 
the course of this research. 


2. Preliminaries. In this section we will collect and prove several simple 
results which are used repeatedly in later sections. In addition, we will state and 
prove the central limit theorem which we use in succeeding sections. In what 
follows D,, Dz, etc., will denote constants appropriately chosen to suit the 
context in which they appear. 

Let {a,} be a sequence of positive real numbers such that 


(2.0) 2 Ga = ®, Lo. <« 


~ ° ° ° . ° an) 
Except for Lemma 1 it will always be assumed in this section that a, = an™ for 
some a > 0. Let 


n 
Bnn = Il (l — a;) 0 . m<n 
(2.1) j=mt+l 
= ] m=n 


It is well known that 


( 


(l + e€,) exp 4 — 7 a;? 


\ j=m+1 


IIA 


(2.2) (1 — &m) exp4 — > G;? & Bus 


j=m+l1 
° ° =J 
for all m = m, where en — 0 as m— ~. In particular, if, fora > 0,a, = an 
we have 


, 


o —_— , . + o 
(2.3) (1 — €&)m'n™ S Bun S (1 + enlmn— 


/ 
where ¢,, ~ 0 as m— ~. 


Lemma 1. Let {W,,,} be a sequence of real numbers converging to W where W may 
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be taken to be x. Then, for any positive integer mo , 
n 
= W 


lim >> an Bnn Wn = 


n-® m—m 
(22) that lens.a Ben 


a 


0. Since 


Proor. For any fixed m it follows from ( 
we have, for any fixed m , 


nse — Sn l.n 


lim > 2 Om Bmn = lim (1 — Bm,-1 
n~o 


noo mm) 


Gallen 


The conclusion of Lemma 1 now follows quite easily. 
Cm} be a sequence of positive real numbers. For each n let 


2-2 2,2 \—1/2 
h, = (Sonn @ Cn m Bua) 
Sc < »~ forall m. Then, if mo ts 


Let 


LemMa 2. Leta > 1/2 and suppose that c,, 


some fixed positive integer 
= 0. 


um &, Bai. 
n-@® 
mo ; otherwise, let m, be the smallest integer 


Proor. If m) > a — 1 let m 
greater than a — 1. To prove the lemma it is obviously sufficient to prove that 


co. Using (2.2) we obtain 


hBa,.— O0asn— 
n —1 
1 ~2_o=1 = 
2. am BnanCn mn ‘) 


oe cin 
<~ Din S Tan a( 
m=—m) 


n -1 
-1 —2. a—1 « 
D, ( 2. am Bmn Cm m ) 


mm) 
which goes to © as m — «= and hence, 


If a > i/2 then n’m*'c,” > ¢ *m™ 
by Lemma 1, the last term in (2.4) goes toO asn— , 
Sc < @ forall m. Let {W.»} be 


Lemma 3. Let a > 1/2 and suppose that c, S 
a sequence of real numbers converging to W where W may be ~. Then, if mo is a 


fixed positive integer 
lim hk, >> a’c,?m 32, Wn = W. 


no MIN 6 


Proor. The proof is easily accomplished upon noting that, for any fixed m, 


n 
. 2 —2 —2 52 
lim Aj, ym am Cn Ban = 1. 


n~o m=—m 1 


Lemma 4. Let {d,,} be a sequence of positive numbers such that 
where €m > 0 asm— o, 


(2.5) dedat:i = 1+ €nm™ 
Letq > —1. Then, for any positive integer mo , 
> d,,m? ~ (1 + q) d,n* 


m=—m 9 
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Examples of sequences satisfying (2.5) are easy to obtain. For example, 
dm = (log m)” satisfies (2.5) for all real u. Note also that if {d,.} satisfies (2.5) 
then {d?,} also satisfies (2.5) for any real number p. 

Proor. Since Sohu1m’ ~ (1 + q)'n"*? what we have to show will be ac- 
complished if we show that 


n q n—l m “4 n “q 
7. 1 dm m Des Pana (dm oars dm41) > j= I + dy pe J — 


-_ n . q 7 7 n qa 
d,, Dail m a. a 1m 
n-1 ly at m 4 
_ b pore dndn (dm dm+i — 1) yin j 
oy en i : ~~ 
> m 


goes to 0 asn — o. Using (2.5) we see that for n sufficiently large 


1 


= A, (say) 


n—l 


d, = dy [I djs. d;° 


7=1 


IV 


y Lengml . —e—1 2-3 _ 
D3e-'*" > Dye ““-™’”—_ > Dgmin 


. . r +1 
where m, is chosen so that |e;| S ¢ < q¢+ 1 forallj = m,. Thusd,n*™ — o as 
n— o and, therefore, in order to show that A, — 0, we can start the outer sum 
in the numerator of A, at m = m. 


By use of (2.5) we have, forn > m = m, 
a—] a~i 
=1 -1 —1 —- 
dnd; = |] d;dj+iS II ( + ej") S Den‘m 
j=m j=m 
and 
—} soi 
\d im m+1 — li S on 


That A, must go to 0 now follows because for all n 


Ddona_m, Den*m™em™ D521 7* 7 
ee jot < «D;n**" 2 m~‘'m**! < «Ds 
> ani M, ) 


and because « is arbitrary. 

Lemma 5. Let Cm = dmm" where r = 0 and where {dm} satisfies (2.5). Let a be 
a real number greater than 1/2. Then, for any positive integer mo , and any positive 
number p, 





n 
(2.6) > aen’m Bn ~ a (2a + rp — 1) en? 
m=>m ( 
asn — ©. In particular, if my = 1 and p = 2 (2.6) becomes 
(2.7) h, ~ a “(2a + 2r — 1)ne, 


. / 
Proor. Let « > 0 and let m, be large enough so that in (2.3) en < ¢€ form > 
m,. Then, using (2.3) and Lemma 4—the conditions of Lemma 4 are satisfied 
if one takes into account the conditions stated here and the remarks following the 
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w 
~I 
~l 


statement of Lemma 4—we obtain 


=m 


2 -p._ —2 52 2 -p. 2a—2 —X 5 —2 
a’ >. c,’m "82, S (1 + ea > cn mn” + O(n”) 
my) 


28 =(l+6a _ da"  n * + On”) 
< (1 + an)a (2a + pr — 1) 'd,?n?” 'n * + O(n”) 
2/6 —1—p —1 , 9 
(1 + a,)a’(2a + pr — 1) c,’n” + O(n”) 
where a, — €asn— &. Similar calculation produces 


i 252 . 2/« ' -l_ — —! 
a > cn’m Bnn = (1 + a,)a (2a + pr — 1) c,°n 


Since n’ “c, + 0 as n — ~ and since e is arbitrary we have achieved the de- 
sired result. 

We shall now state and prove a central limit theorem which we use later in an 
essential way. The multi-dimensional version we give (see Lemma 6) is a direct 
generalization of the one-dimensional result which may be found in Loeve 
[10], p. 377 C. The proof we give is likewise a direct generalization of the proof 
given in [10]. In Sections 3 and 4 it will suffice to consider only the one-dimen- 
sional case; we make use of the result for higher dimensions in Section 5. 

With all vectors considered as elements of g-dimensional Euclidean space we 
adopt the following notation. If x, y are vectors [z, y] will denote their inner 
product. The norm of a vector z we denote by |z| and, of course, is equal to 
(x, x)”. If Bisagq X q matrix we define in the usual way, 

B\\ = sup [Bz, Bz]'” 
z\=1 
The obvious facts that |Bz| = |B) |x| and that ||B,B,|) S || B,), | Be|| will be use- 
ful below. J will denote the identity q X q matrix. B’ and z’ will denote the 
transposes of the matrix B and vector x respectively. Unless otherwise indicated 
a vector is to be considered a column vector. 

Let {U,4;1 Sk S n,n = 1} bea family of vector random variables, the 
distribution of U,, being denoted by Fy. Let Viz = (Um,---, Unas) and 
suppose that E(U,.| V.xx) = 0 with probability one. Denote the covariance 
matrix of Uni by Spe i.€., Sar = E(UmU ss). Let ru = E(UnU ne | Vaz). Let 
Un = DCUn, & = Dosa, andr, = > ra where all three summations are over 
1S k <n. Fore > 0 define gd, = 1 if |Unn| > €,¢,, = 0 otherwise. 

Lemma 6. If 


n 


(2.9 lim 7 E(\\ rak — Suk) = 0 


n»~e® k=] 


(2.10 sup >, E(| Un|?) < @, 
n k=l 
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and, for every « > 0, 


(2.11) lim >> E(| Une |'o5%) = 0, 

no k=l 
and if s, — 8, 1.€., 8, — 8 > 0, then U,, is asymptotically normal with mean 
0 and covariance matrix s. 

Proor. Let F and G be q-dimensional distribution functions with character- 
istic functions f and g and finite covariance matrices C and D respectively, and 
let H = F — G. Let 6; , 6 denote quantities whose absolute value is less than 1. 
Let A = {x| |x| S e} and let A’ be the complement of A. Then, for fixed ¢ and 
e< l/l, 


fo —g®|s | (t, x] dH(x)| + : 


< 


| [t, xe dH(x) 


a [ A(t, a)’ dH(x) + [ at, x)” dH (x 


| (t, 2] dH(x)) + [ (t, x)’ dH (x) 


. 


+ et [it al} d(F +G) +: 


| [t, x] dH(xz)\ + |t\? ||}C —D 


+elt | «|? d(F +G)+: 


Let G,, denote the normal distribution with mean 0 and covariance matrix 
Sue. Let {Yn 51 Sk S n,n = 1} bea family of independent random variables 
with the distribution of Y,, being G,, . In addition, take { Y,,} to be independent 
of {Ux}. It is easy to see that Y, = Yai + --- + Yun is asymptotically normal 
with mean 0 and covariance matrix s. 

Let fax, fn, Gnk , and g, denote the characteristic functions of U.,, Un, Yu 
and Y, respectively. Let 

+7_t(t.Unal | 17 
farlt) = E(e™"™ | Vat). 
To prove the lemma it is clearly sufficient to prove that, for each fixed ¢, 
limns« |fa(t) — ga(t)| = 0. 


Let W nk => xs + eee tf eas + y aaa4 + eee -e Tai for 1 < k < f, Wa = 
Yuna +--+ + Yan, Wan = Uns + -++ + Unni. Then 


|fn(t) — ga(t) | = | E(etm! — eflt-%a}) | 


n 
E et lt Une) “i eit ¥nal eilt Wael 
(2.13) 2 ( ) 
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‘t) — Gnk\t) 
From (2.12), (2.13), and the fact that E(U. | Vx.) = 0 we obtain 


f(t) — galt)| Slt? 2 Ell rae — sat |] + 2e/t!? >) E| Uw’ 


k=1 


+. Big . 2 E(@4h: ty : +: rc’? dG, 


k=1 kml ~ A’ 


Asn— ~ the first and third terms on the right-hand side of (2.14) go to 0 because 
of (2.9) and (2.11), the second term is O(e) because of (2.10), and the last term 
goes to 0 because G,, is normal with covariance matrix s,, and ||s,,!| goes to 0 as 
n — x uniformly in k < n. Since e is arbitrary this finishes the proof of the 
lemma. 


3. The Robbins-Monro Procedure. Let WM be a fixed function such that the 
equation M(x) = @ has a unique solution x 6. For each z let Y(x) be a random 
variable with EY (z M(x). The Robbins-Monro procedure for ‘“‘finding” 6 
is defined as follows. Let fa, ,n > 0} be a sequence of positive numbers such that 


(3.0) > a = @, > a, < @. 


Let X, be some fixed number (X,; may be taken to be an arbitrary random 
variable for what follows since, if EX] < 2x, the same proofs will hold, while, if 
EX{ = «., the results are obtained by truncating X, and using the results for the 
case EX; +) and define {[X,,” > 1} by the recursion 


Xazt = X, — an(Y(X,) — @) 


is a random variable whose conditional distribution given X, 
= zg, is the same as the distribution of Y(z,). Letting Z(z) = 


(3.1) becomes 


¥.o< ¥, + ott.) + «4+ 2d 


EZ(x) = QO for all z, and the conditional distribution of Z(X,) given X,; = 
ry, +: ,X, = 2, is the same as the distribution of Z(z,). We note for future use 
that, as a consequence of this, 


(33 E(Z(X,) | Z(X1), «+: , Z(Xna)) = 0 


with probability one. 

For Theorem 1 we make the following assumptions about M(x) and Z(z). 
The connection between these assumptions and those made by previous authors 
is pointed out below. 

Assumption (Al). M is a Borel-measurable function; M(@) = a and 


(x — 6)(M(xr) — a) > 0 
forallz + 6. 
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AssuMPTION (A2). For some positive constants K and A, , and for all 2 
K\|xz—06| Ss |M(xz) — a| S K,|z —- @| 
AssuMPTION (A3). For all x 
M(x) = a+ a(x — 6) + A(z, 8) 
where 6(x, 0) = o(\x — @|) asx — 6— 0 and where q; > 0. 
ASSUMPTION (A4). 


(a) sup EZ*(x) < ~; (b) lim EZ*(x) = o° 


r+6 


ASSUMPTION (A5). 


lim lim sup | Z (x) dP = 0 
R»w e007 | z—-O)/ <e “{|Z(z)|>R} 
When X, — @ with probability one (for example, under (Al), (A2’), and 
(a) of (A4) as shown by Blum [2], and when (a) of (A4) holds, (A5) implies 


(3.4) lim sup Z’(X,) dP = 0 


Roo k ~{1Z(X_E)|>R} 
which is actually what is used in the proof below. The reason we state (A5) in 


the way we have is that it appears as a more natural condition than (3.4). 
Simple conditions which imply (A5) are given by 


(3.5) {Z(x)} are identically distributed 

or 

(3.6) sup E|Z(z)|**" < « 
z—O\<e 


for some e > 0 and some v > 0. 
Assumption (A2) can be weakened to 
AssuMpPTION (A2’). For all z and some positive constant K, 


M(x) — a| S K,|2 — 9 
and, for every 4, , f2 such thatO <4 <b < @, 


inf M(x) —a|>0. 
ti: S|2-O0| Ste 

In Theorem 1’ we obtain the result of Theorem 1 with (A2’) replacing (A2)—the 
truncation device used in the proof there is due to Hodges and Lehmann [8]. 

Under (A1), (A2), (A3), the assumption that EZ*(z) = o° for all z, and the 
assumption that (3.6) hold for « = © and all v Chung obtained the result of 
Theorem 1 (this is what is referred to in [5] as the ‘‘second case”). Hodges and 
Lehmann proved the result of Theorem 1 under (A1), (A2’), (A3), (A4), and the 
assumption that (3.6) hold for some e > 0 and all v. Thus Theorem 1’ includes 
these earlier results by virtue of the greater generality of (A5) over related con- 
ditions made by previous authors. 
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As before D, , Dz , etc., will denote positive constants appropriately chosen for 
the context in which they appear. 

THEOREM 1. Suppose that Assumptions (Al) through (A5) are satisfied. Let 
a, = An” forn > O where A ts such that 2KA > 1. Thenn'*(X, — 6) is asymptot- 
ically normally distributed with mean 0 and variance A’ (2Aa, — 1 

Proor. There is no loss in assuming that a = 6 = 0. Abbreviating 6(X, , 0), 
M(X,,) and Z(X,) by 6, , M,, and Z, respectively, and using (A3) we rewrite 


(3.2) and obtain 


(3.7) Xnar = (1 — Aayn™")X, — An“, 


Let a = Aa, and let Bm, be as in (2.1) with the a; in (2.1) replaced by aj 
Iteration of (3.7) then yields 


3.8 Xati = Bondi — : “BmnOm — A 2 m Bua le 


Let A, (Fo ;am Bun). Then, by Lemma 5, 


1/2 —1 1/2 


h, ~ (2a — 1)"a n 


Hence, proving that n° “X, is asymptotically normal with mean 0 and variance 
A*o(2a — 1)~ is equivalent to proving 


3.9) h,X, is asymptotically normal with mean 0 and variance A‘oa ~. 


> 


Using (3.8) it is clear that we can show (3.9) by proving 
£ ; 5 


(3.9a h,Bon — 0 


3.9b) hn 2, AM Ban dm — O in probability 


m=! 
3.9¢) Rn >. am Ban Zm is asymptotically normal with mean 0 and variance o’. 
m= 


3.9a) follows immediately from Lemma 2 with c,, = 1 for all m. To prove 
. , *7) : ae , -1 " % 
3.9¢) we will invoke Lemma 6 with g = 1 and Ux, = h,ak BinZ, . To see that 


> 


we can do so observe first that by (3.3 
E(Un | Um,--:, Oner) = E(Une| 21, --- , Zen) = 0. 


Let due = 1 if |\Un| 2 € and ¢,, = O otherwise, and observe that in order to 
verify (2.11) we have to check that > b= E(én%U nz) — 0 or, what is the same, 


3.10) hi, z. ak *8in E(dn Zi) — 0 

k=l 
Noticing, by Lemma 5 and (2.3) that ¢,, = 1 implies, for some ¢ > 0, that 
Z,, = én "7k > &k'?, we apply (3.4) which is obtained from (A5) and ob- 
tain 
(3.11) lim E(¢; Zi) = 0 


ko 
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where ¢ = 1 if |Z,| = ¢k’” and ¢, = 0 otherwise. Since ae applying 
Lemma 3 with c, = 1 for all m and using (3.11) shows that (3.10) is valid. 
Verifying (2.9) is equivalent to showing 


(3.12) lim hi, >> ak 3; E | E'(Z?(X,)] — EZ?(Xi) | = 0 


n> k=l 


where E’ denotes conditional expected value with the conditioning being by 


Vx. . Use again of Lemma 3 shows that it is sufficient to prove 
(3.13) lim E | E'(Z°(Xi)) — EZ*(Xi) | = 0 

k+o 
But (3.13) follows easily by observing that the expression between the absolute 
value signs is uniformly bounded ((a) of (A4)) so that Lebesgue’s theorem is 
applicable, and by observing that (b) of (A4) together with the convergence of 
X,, to 6 w.p.1 imply 


(3.14) lim E’(Z?(X,)] = lim EZ*(X,) = o’. 

k+w kon 
(3.14) and Lemma 3 also serve to show that (2.10) is satisfied with 
(3.15) lim s, = o 


This completes the verification that Lemma 6 is applicable and therefore estab- 
lishes (3.9¢). 

To prove (3.9b) we require the estimate that EX), = O(n™'). This estimate is 
obtained by Chung [5] but we obtain it here for completeness. The methods are 
essentially the same. 

Squaring both sides of (3.2), taking expected values, and using (A4) we get 


(3.16) EXi41 = E(X, — An‘M,) + O(n”) 


Then, by (Al) and (A2), for e sufficiently small so that 2KA — ¢€ > 1, and for 
n sufficiently large, say n > Ni, 


EX?,, < (1 — 2KAn™ + A’Kin™)EX?, + O(n) 
< (1 — (24KA — e)n")EX* + Dun 


(3.17) 


Let p = 2KA — e and let Bun be defined by (2.1) with a; = pj. Choose N, 
large enough so that p < N, (this is to guarantee that Bran > 0 form = N,; so 
that (3.18) can hold). Iteration of (3.17) yields 


n 


EXhu1 SD: Dm Ban + Bun EX¥,41 S Din + Dyn 


(3.18) m=N {+1 


= O(n) 


which is the estimate we require. 
Let ¢ > 0. Since 6(x) = o(\x|), for ¢ > O we can find « > 0 with the property 
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that 


(3.19) 6(x) = fiz lor 2x 


As was pointed out above XY, — 0 w.p.1; hence, we can choose N» so that 
(3.20) Pi\XjS 647 2N2} > 1-8. 


Let N; be larger than N, and N2 and such that a < N; + 1. Then, denoting 
nd m=n, 2M Bradm by Vz and h, Domenv, aM Bmn|Xm) by V2, and using 
(3.20), (3.19), a Chebyshev-type inequality, (3.18), and Lyapounov’s inequality, 
and (2.3) we have forn > N;, 


(3.21) Pt V. a t} < t — Pi V. > é s €,, = N;} 


x 
<t+ Pifvt >t} < t+ tEvV* 


< t+ Dyth, >> m'Bnnm'? < Det. 
Ny 


(3.21) together with the fact that h,8,, — 0 for any fixed m (Lemma 2) estab- 
lishes (3.9b) and finishes the proof of the theorem. 

THEOREM 1’. Suppose that Assumptions (Al), (A2’), (A3), (A4), and (A5) 
are satisfied. Let a, = An’ where A is such that Aa, > 1/2. Then n'?(X, — 6) 
ts asymptotically normally distributed with mean 0 and variance A®e’(2Aa, — Yr. 

Proor. We assume with no loss of generality that a = 6 = 0. Let ¢ > O be 
such that A(a; — t) > 1/2. Let K = a — ¢t. Then we can find an e > 0 such that 
for |r| S « 

(3.22) K\|z| s |M(z)| s K,|z!. 


Define M’(x) = M(a) if |x| S ¢«, M’'(x) = Krif |r| > «. 

Since under (Al), (A2’), and (A4), X, — 0 w.p.l we can find N so that for 
u > 0, 
(3.23) Pi|X,|s64j2N}>1—-—u. 


Let Xi = Xws; and define {X,,n = 1} by the recursion 
(3.24) Xena = Xa — GeanMl'(Xe) — GaanZ(Xa) 
It is clear that the assumptions of Theorem 1’ together with (3.22) show that 
Theorem 1 is applicable to X’,, M’, {a,.~}. Hence, for all y, 
(3.25) lim P{(N + n)'?Xi4, < y} = Fly) 
n~e 

where F is the normal distribution function with mean 0 and variance 
A’o’(2Am — 1)”. Using (3.23) and (3.25) we obtain 

lim P{n'?X, < y} = lim P{(n + N)?Xuaw < y} 


n+>e n>2 


< lim P{(n + N)'?(Xaaw — Xn) 


no 
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(2 9@ i r\L/2y! r , r 
(3.26) 'e (n si N ) X n i Ys X; | = €, J =< N} 7 oe 
. T\ 1/2 yr’ 
= lim P{(n + N)'°X, < y} + u 
n~o 


ll 


Fly) + u 
Similarly, we obtain 


(Q OF : {,l/2y 
(3.27) lim P{nX, < y} 


n>2 


IIV 


F(y) — u 


Since u and y are arbitrary putting (3.26) and (3.27) together finishes the proof 
of the theorem. 


4. The Kiefer-Wolfowitz Procedure. Let M be a fixed function with a unique 
maximum at « = 6 (by making the obvious alterations in what follows we can 
replace ‘‘maximum”’ by “minimum”’). For each z let Y(x) be a random variable 
with EY(x) = M(x). The Kiefer-Wolfowitz procedure for locating the maximum 
is defined as follows. Let {a,}, {cn} be two sequences of positive numbers such 
that 


(4.0) 7 a, = @, c, — 0, > a,c, < 


Let X, be a fixed number (by the same reasoning as in Sections 3 X, can be taken 
to be an arbitrary random variable for what follows) and define {X,,n 2 2} by 
the recursion 


(4.1) Xnut = Xn — Quen [Y(X,n — cn) — ¥(Xn + €n)] 


where Y(X, + c,) is a random variable whose conditional distribution given 
Xi = ™,°::, Xn, = 2, is the same as the distribution of Y(z, + c,). It is 
usually assumed that Y(X, — c,) and Y(X, + c,) are conditionally independent 
ie., for all Borel sets A and B P{Y(X, + ¢,) ¢ A, Y(X, — ¢,) ¢ B| X,} = 
P{Y(X,+ en) € A| Xn} P{Y(X, — cn) ¢ B| X,}. Though this is commonly the 
case in practice we do not make this assumption since it is unnecessary to do so. 
Whatever assumptions we do need to make about the joint distribution of 
Y(X, —c,) and Y(X, + c,) are contained in (BS). Letting Z(x) = Y(r) — M(x) 
and writing M, for M(X, —c,) — M(X, + c,) and Z, for Z(X, — ca) — 
Z(Xn + ¢c,), (4.1) becomes 


(4.2) Xn = Xn — GnCan (Mn + Zz), 

EZ(x) = 0 for all x, and the conditional distribution of Z, given X; = 1, -- 
X,, = 2, is the same as the distribution of Z(x, — cn) — Z(t, + cn). We note 
that, as a consequence of this, 

(4.3) E(Z,|\Z1,-++ , Zara) = E(Z,| X1,°--:, Xn) = 0 


with probability one. 
We now make the assumptions we require for Theorem 2. Other assumptions 
relevant to later theorems are listed further on. The connection between these 
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assumptions and those made by previous authors is pointed out below. As before 
D, , Dz, etc. will denote appropriately chosen positive constants. 
AssuMPTION (B1). M(z) is a Borel-measurable function, has a unique maximum 


at x 6, and, for0 <h <i < kh < @, 
(44 nf (x — 6)(M (x -—_= M(x + ¢) > 0 
In addition, for all z and suitable D; and Dz, 
(4.5 M(x + 1) — M(z)| < D, + Delzx 
AssuMPTION (B2). For all z 
M(x) = ao — a(x — 0) + A(z, 8) 
where ap is some real number, a > 0, and 6(z, 6) = o(\x — 6°)asx - 6-0 


AssuMPTION (B3). For some co > O there exist positive constants K, and K- 
such that, for all z and all c for which 0 < ¢ S @, 
Ki(x — 6) S (x — 6)[M(x — c) — M(x + o)\c’ S Kil(x - 6) 
Assumption (B4). For every « > 0 there exists c. > 0 such that, for all c 
satisfying 0 < c S c, and all z satisfying |r — 6| <c, 


6(x —c, 0) — br + ¢, Oc ZS elx — 8 


AssumpTION (B5) 


(4.6) sup EZ*(xr) = 8s < « 
(4.7 lim E[Z(z — a) — Z(x + a)} = o’. 
z~6 


In case Z(Xm — Cm) and Z(X + Cm) are uncorrelated we can replace (4.7) by 
(4.8 lim EZ*(x) = o°/2 
r+6 
ASSUMPTION (B6). 
lim lim sup | Z*(xz) dP = 0 
Row e007 [zi ce ~{|Z(z)|>R} 


When X, — 6 w.p.1 (for example, under (B1) and (4.6) as shown by Burk- 
holder [4] and Dvoretzky [7]|—Blum [2] proved convergence w.p.1 earlier but 
under stronger restrictions) (B6) implies 


(4.9 lim sup | Z,dP = 0 
Ree k 9 (|Z4l>R} 


The remarks made about (3.4) and (A5) pertain here to (4.9) and (B6) and, as 
with (A5), (B6) is satisfied if either (3.5) or (3.6) is fulfilled. 
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In Theorem 2’ we obtain the same result as in Theorem 2 with (B3) replaced 
by the weaker restriction 

AssuMPTION (B3’). For some cp > O there exist positive constants A, and K, 
such that, for all « in some neighborhood of @ and all ¢ for which 0 < ¢ S eo, 


Ki(z — 0)? S (x — 0)[M(x — ce) — M(x + o)|c" S K(x — 0)’ 


(B3) ((B3’)) is used only for Theorem 2 (2’); it is replaced by a different con™ 
dition for later theorems. (B4) which is also used only for Theorems 2 and 2’ is 
fulfilled whenever M satisfies (B2), (B3’), and has a continuous second deriva- 
tive in some neighborhood of 6 with M’(@) = — 2a (i.e., 6”(@) = 0). When 
(B2), (B3’), and (B4) hold simultaneously it is redundant to require the lower 
inequality in (B3’). 

It is easy to see that (B3) (also (B3’)) implies that M is symmetric in some 
neighborhood of 6; in fact, M(@ — c) = M(@ + c) for alle < ce. If (B3) is 
satisfied and the interval of symmetry is known, i.e., co is known, Burkholder was 
able to show that modifying the Kiefer-Wolfowitz procedure by taking c, = co 
for all n will yield, under certain additional restrictions, the fact that n'*X, 
is asymptotically normal with mean 0 and a certain variance. It is easy to check 
that this result can be obtained, under Assumptions (B1) through (B6) and the 
assumption that M(x — co) — M(x + oc) is differentiable at r = 6, by using 
Theorem 1, replacing the M(x) in Theorem 1 by [M(x — c) — M(x + co)| / eo. 
Since this modification depends on knowing ¢o it will usually be undesirable. 
Theorem 2 (also 2’) gives a result using the Kiefer-Wolfowitz procedure which 
has the advantage of not depending on cp. This gain, however, is offset, if co is 
known, by the fact that, in general, for the Kiefer-Wolfowitz procedure X, can 
never be O,(n™*”) (see Example 1). However, as noted in the remarks following 
the proof of Theorem 2, {a,} and {c,} can be chosen so that X, is arbitrarily 
close to being O,(n™”) without ever attaining it. 

Under a stronger set of Assumptions than (Bl) through (B6) Derman {6} 
proves a weaker result than the one we prove in Theorem 2. Using Chung’s 
methods he shows that for any f < 1/2 there exist sequences {a,} and {c,} such 
that n'(X, —6) is asymptotically normal with mean 0 and a certain variance. 

THEOREM 2. Suppose that Assumptions (B1) through (B6) are satisfied. Let 
AK, > 1/2 and take 


, —1 
(4.10) a, = An 
Let {c,| be a sequence of positive numbers satisfying (4.0) with a, = An”, and 
: . a wn 1/2 ~ ; a 

the assumptions of Lemma 5 with r = 0. Then n’'e,(X,— 6) is asymptotically 
normally distributed with mean 0 and variance o° A*(8aA — 1)". 

Proor. With no loss of generality we assume ao 6 = O. Abbreviating 
6(X, — cn) — 6(X, + c,) by 6, and using (B2), we rewrite (4.2) and obtain 


(4.11) Xavi = (1 — 4aAn™")X, — An™'c,'5, — An”'c,'Z,, 
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ta. Using the notation of (2.1) with a ) , iteration of (4.11) 


By Lemma 5 with r 0 and p 2 we have 
~a ‘(Za l)ne, 


Hence. what we wish to prove is that 


a(2a — | h,X, is asymptotically normal with mean 0 and variance 


b.14 
Aa (SaA ] 


After multiplying both sides of (4.13) by 1) "ah, it becomes clear that 


(4.14) will be proved il we can prove 


$.15a 


4.15b l, t Cm Bmn om — in probability, 


and 


4.150) h, zz aM Cr Bing Zmisasvmptotically normal with mean 0 and variance o’. 
Lemma 2 shows that (4.15a) holds. We establish (4.15¢) by using the same 
argument used to prove 3.9¢) in Theorem 1. The details being the same we omit 


the argument except to note that, by Lemma 5, and (BS), 
h- = am 4 3 Ez... 


Note that up to this point the only assumptions used have been (B1), (B2), 
B5), and (B6). This observation will enable us to begin the proofs of later 
theorems at the point where we have to verify (4.15b). 

To establish (4.15b) we require an estimate of EX*. which we now obtain. 

Squaring both sides of (4.2), taking expected values, and making use of (B3 
and (B5) we obtain, for n > No where No is large enough so that cv, < ¢ 


(4.16 EX... 3 (1 — 2AKwn + A°Ksn )EX. + eA’c, 2 


Let u > 0 be such that 24K, — u > 1 and denote 2AA, —u by p. Then, for 


sufficiently large n, say n 2 N,, (4.16) implies 


(4.17) EXia1 S (1 — pn")EX, + Dycan 


om o r ° , rae ° ° 
Put a; = pj in (2.1) and denote the 8,,,, thus obtained by 8,,, . Then, iterating 
(4.17) and using (2.3) and Lemma 4 with « d, and q = p — 2, we obtain, 
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EX; 1 < Bnys EXy,+1 <3 dD, >: mn "ba 
m=Ni+1 


/ 


(4.18) 


< Dan’? +D; Dd m?*e2n” = O(c,'n™") 


which is the desired estimate. 
For each integer m define ¢,, to be 1 if | X,| S c,» and ¢, = 0 for| X.| > 
Cm. Then, to prove (4.15b) it is sufficient to prove 


(4.19a) h 2. m ‘Cm Bmn 5mm — 0 in probability 
(4.19b) hy > Mm ‘Cm Bmn Sm(1 — om) —> 0 in probability 
m=) 


For e > U it is a consequence of (B4), that, for m sufficiently large, say m > N2, 
| bmimCm | S €|Xm{|. Use of Lemma 2 and a Chebyshev-type inequality now 
show that (4.19a) is implied by 


‘ l 1/27 42 
(4.20a) h, > minke (e,) = OU 
m=No 
¥ s 1 iy > 722) 
Since bmCm O(| X» |) (a consequence of (B3)) and 


E(|Xm|(1 — om)) S PY?{ | Xm| > cm} E?(Xn) 


it follows, in similar fashion, that (4.19b) is implied by 


(4.20b) hy 2s m "Bm, BE’? (X2)P*?{|Xm\ > Cm} = o(1) 
m=) 

Since our assumptions on {c,} imply that c,n’* —> « (see the proof of Lemma 4 
where it is shown that d,n*** — «© for g + 1 > 0) we have P{|X,| > cn} S 
Cn EX? = 0(cn'm™) > 0 as m— ~. Hence (4.20b) will follow from (4.20a) by 
an argument like that in Lemma 3. 

To show (4.20a) observe that by (2.3), (4.18), and Lemma 4 with dn = c, 
and q = a — 3/2 (a — 3/2 > — 1 because our assumptions imply that 4a => K, 
and hence a = 4aA = AK, > 1/2) we have, forn > N; = max (N,, No), 


(421) h, > mM 'Bm, E?(X,) S Dyhan > co. £ hia 
Ne N3 


Use of (4.13) and Lemma 2 yields (4.20a) thus completing the proof of Theorem 2. 
THEOREM 2’. Suppose that all the conditions of Theorem 2 are satisfied except 
that (B3) is replaced by (B3’). Then n'c,(X, — @) is asymptotically normal with 
mean 0 and variance o'A*(8aA — 1)”. 
We omit the proof since it follows from Theorem 2 by use of the same trunca- 
tion argument used in obtaining Theorem 1’ from Theorem 1 
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It is easily checked that, for any sequence | 


f,} of positive numbers approach- 
ing 0, there exists a sequence {c,} satisfying the conditions of Theorem 2 and 
such that c, 2 f, . Thus Theorem 2 says that under its conditions we can always 
find sequences {a,} and {c,} satisfying (4.0) and such that X, is arbitrarily close 
to being O,(n””) without ever attaining it. The question then arises as to 


whether it is possible to choose {a,} and {c, 


' satisfying (4.0) and such that 
X, = O,(n**). The answer to this is, in general, negative. To see this we 
give the following example. 

Examp.e 1. Let M(x) = —2x°/4. For each z let Z(z) be normally distributed 
with mean O and variance 1/2, and let Z(X,, — c,) and Z(Xm + Cm) be inde- 
pendent. Then {Z,,} is a sequence of independent normal random variables with 
mean 0 and variance 1. Note also that {Z,,} and X; are independent. Let {a,} 
and {c,} be sequences of positive numbers satisfying (4.0). We now show that 
it is impossible that, for any infinite sequence {n,} of distinct integers, 

(4.22) lim lim P{nj Picasa <y} = 1. 

yen kew 
For, if it were possible, writing X,4; = Bon X; — a 1 OmCm Bmn2Zm. we would 
have 


nek 7 
lim lim P< ni~ 7 Gu Cu Baa, ln < YY? = 1 
‘an 


yr2 kew 1 


which, by the normality of Z,, , implies that 


nk 
Ny D> On Cn Ban, = O(1) 


m=! 


But this is impossible by Lemma 1 and the fact that 


nk 2 nk 
(= Om c='Bons) Sm Do an Co Ban, 
m=l m=) 
For Theorem 3 we drop (B3) and (B4) and substitute in their place 
AssuMPTION (B7). There exist positive numbers e¢, co , and K,; with « > c such 
that, for all c S co and all x satisfying ec < |r — 6| < « 


(4.23 (x — 6)[M(x — c) — M(x + o)|c” > Ki(x — 06)° 


(B2) and (B7) are both implied by the condition (which we refer to hereafter 
as the derivative condition) that M has a continuous second derivative in a 
neighborhood of 6 with M”(@) = —2a. Under (B1), the derivative condition, 
(B5), and the assumption that (3.6) hold for all v > 0 and some e > 0, Burk- 
holder produces, for every ¢ < 1/4, sequences {a,} and f{c,} for which 
n'(X, — 6) is asymptotically normal with mean 0 and a certain variance. The- 
orem 3 shows that under weaker restrictions the same is true for ¢ = 1/4. 

THEOREM 3. Suppose that Assumptions (B1), (B2), (B5), (B6) and (B7) are 
satisfied with K, < 4a in (B7). Let c, = n~‘ and a, = An™ where A is such that 
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AK, > 1/4. Then n'*(X,, — 0) is asymptotically normal with mean 0 and vari- 
ance o A’ (8aA — 1)". 

Proor. Let ag = 6 = 0. If we prove Theorem 3 when (B7) is strengthened so 
that (4.23) holds for all 2 satisfying | «| > e and, in addition, | M(x — c) — 
M(x+c)| S Ke! x} forall|z| > eand alle S co then, by using the truncation 
device used in the proof of Theorem 1’, we will be able to establish Theorem 3 
with (B7) as it stands. By the remarks made in the proof of Theorem 2 we will 
be finished with this proof if we can verify (4.15b). 

As previously we require an estimate of EX® obtained as follows. Let ¢, = 1 
if| X,! Sc, and¢, = Oif | X,| > c,. Let ¢ > O. Then, it is a consequence of 
(B2) that, for n sufficiently large, say n > N, 


(4.24) |6,/@n S te, 


Squaring (4.2), taking expected values, and using (B5), (B2), and the strength- 
ened form of (B7) yields 


EX‘41 S Eon(X, — An™'c,’M,)° + E(1 — $,)(X, — An™'c,'M,)° 


+ Dgn-“c <r 


(4.25) < Fo,(1 — 4adAn')N*, + DoE, | 6,X, | n 


+ DywEd,6.n c.. 
+ E(1 — ¢,)(1 — 2K,An™ + A’Kin' “cn )X, + Den''c, 


Let wu and w be positive numbers such that 2A,A — w > 1/2 and 2A,A — 
w < 8aA — u, and let 2K,A — w be denoted by p. Choose N, > N;, so that, if 
n> N., A’Kin''c, < wn" and 16a°A’n* < un’. Then, for all n > N2, we 
have from (4.24) and (4.25) 
(4.26) EX3.4, < (1 — pn ')EX?, + Dunc,” + (Dene: 
(1 — pn EX’, + Din” 
Iteration of (4.26) now shows that, for n > Ne, 
(4.27) ee as O(n" *) 
which is the desired estimate. 
(B2) and the fact that X, — 0 w.p.l imply that 
lim 5,,(X* + c,) 0 
w.p.l. Hence an argument like that in Theorem 1((3.20) et seq) shows that in 
order to verify (4.15b) it is sufficient to prove that, for any integer VN > N, 


n 


(4.28) h,, Z m 'c,, Bp ¥ EX: + c.,) = (U1). 


m= N 


Putting c,, = m* in (4.28) and using (4.27) this follows quite easily, thus finish- 
ing the proof of Theorem 3. 
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If we make some further assumptions about M we will be able to improve on 
1/4 . : , ‘ : ‘ 
the n’* obtained in Theorem 3. To this end note that from (B2) we have 


| (x) | < «(x — 0)’ where e, ~ O asx — 8. 


Assumption (B8) which we now specify is an assumption about e, . 

AssumMpTION (B8). There exist positive numbers co, p, and R such that, for 
alle S o@, 

sup ¢« S Re° 
jz—O\ <e 

If 5(z, 6) = O( | x — 6|*) for x near @ it is easy to see that (B8) is satisfied for 
appropriate R and co and p = 1; thus, the case of most interest is when p = 1. 
(B8) is very closely related to Burkholder’s condition of ‘‘local-evenness’”’—for 


p = 0, M is called p-locally-even if 


. -1 
lim sup flee 


«+0 


where f(e) = sup |x| M(x — e) — M(x + e) S 0}. It is easy to verify that when 
M is continuous in a neighborhood of 6 and (B8) is satisfied then M is p-locally- 
even. In fact, when M satisfies the derivative condition, requiring M to be 
p-locally-even is equivalent to requiring 6(r, 6) — 6(—z, 0) = O(|x — @|?**) 
as x — 6-0. The disadvantage in assuming the slightly more restrictive (B8) 
rather than local-evenness is allayed by the fact that (B8) appears as a more 
natural condition. 

Under (B1), the derivative condition, (BS), the assumption that (3.6) hold 
for all vy > 0 and some e > 0, and the assumption that M is p-locally-even, 
Burkholder proves that, for any tf < (1 + p)/(4 + 2p), there exist sequences 
{an} and {c,} such that n‘(X, — @) is asymptotically normal. Theorem 4 re- 
places the condition of local-evenness by (B8), weakens the other assumptions 
made by Burkholder, and gives a stronger conclusion, e.g., with p = 1 in (B8), 
a, = An ',andc, = (n'* log n)', Theorem 4 shows that n’® (log n)~' (X, — 6) 
is asymptotically normal. 

THEOREM 4. Suppose that Assumptions (B1), (B2), (B5), (B6), (B7), and (B8) 
are satisfied with K, S$ 4a in (B7). Leta, = A n' where A is such that AK, > 1 
and let c, = d, n” with d, — 0 and satisfying (2.5) of Lemma 4 and with r - 
(4 + 2p). Then n'” c,(X, — 6) is asymptotically normal with mean 0 and variance 
oA’(SaA — 1)”. 

Proor. Let ao = 6 = 0. By the reasoning in the first paragraph of the proof 
of Theorem 3 we have only to verify (4.15b). To do so we require an estimate of 
EX%, which is obtained in much the same way as (4.26) is obtained. In fact, 
using (B8) to replace (4.24) by 


(4.29) | dndn | = O(c”), 
a repetition of the argument given in Theorem 3 shows that 


(4.30) Ex; .. & (i —- pn"')EX* + Dw *c.” + Dac. 
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Iterating (4.30) and applying Lemma 4 and (2.3)—note that p = 2K,\A —w> 1 
and that p > r(2 + p)—yields, for n > N where N is chosen sufficiently large 
(how large can be determined by inspecting the proof of Theorem 3), 


(4.31) EX? 


IIA 


O(n-?) + Dy D> m'c2"6., + Du 2 mM Cm Ban 
m=N m= N 
= O(n~”) + O(cx*?) + O(n™'c;”) 
= O(en”) + O(n'c,” 
which is the desired estimate. 
To prove (4.15b), it is sufficient to prove that for N; sufficiently large 


(4.32) hn D> Mm 'Cn'Bmn dm 5m — 0 in probability 
m=N} 
(4.33) hn 2. M Cx Bmn(l — om)dm — 0 in probability. 
m=N; 


By (4.29), (2.7) of Lemma 5, and Lemma 4—note that 4aA > r(1 + p)—we 
obtain, forn > Nj, 


(4.34) lin Dd. MCn'Ban Omim| = O (i, 7. mich" un) = O(n'c%**) 
N} Ni 


which, by the choice of {c,}, proves (4.32). 

To show (4.33) we proceed as follows. Let u» = 1 if | Xm| S co (co here is the 
same as in (B8)) and let un, = 0 otherwise. Using (B8) and, from (B2), the fact 
that |6,,| = O(| X,|) if| Xn| > co, we then have, for N, sufficiently large 
so that in particular c,, < co form 2 N, 


hy, 2 m'Cn'Bmn(1 — om) | Om | 
Ny 


ll 


—1 —1 s 
hy z. Mm Cy Bmn(1 — Dm) Mm | Om | 
N} 


(4.35) 
+ he D> MCR Brn(1 — dm)(1 — tm) | bmn | 
Ni 
™ 0 (1, - mM 'Cm Bmm xs) + O (i. »~ mM Cn Bmn(1 — Lm) | Ti ) 
Ny Ny} 
Now 


Rn DM Cr Bn E{ (1 — pm) | Xm} S Anco’ 2, M Cn’ Bma EX 
Ny Ny 
and since, by (4.31) and Lemma 4 (note that 4aA > 1), 


hn > M Cn Bn EX>, = O(n?c%*") + O(n C7”), 
N; 
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our choice of c, shows that the right hand side of (4.35) goes to 0 in probability. 
This establishes (4.33) and finishes the proof of the theorem. 

Focusing our attention for the present on the case p = 1 (this is by no means 
necessary since all ensuing remarks can be suited to the cases where p ~ 1), we 
can ask whether or not it is possible to find sequences {a,} and {c,} satisfying 
(4.0) and a sequence {g,} such that, under Assumptions (B1), (B2), and (B5) 
to (B8), gnXxn41 is asymptotically normal and g,’ = O(n’). Example 2, which 
we now give, shows that the answer to this question is no. 

EXAMPLE 2. Let {a,} and {c,} be sequences satisfying (4.0). For0 < C < 1/6 
let M(x) be defined as follows. 


M(z) = —-7/4+2° if |2| c 
(4.36) —-r/4+C if t>C 
w-¢/4—C ff zs < -—C 


For each x let Z(x) be normally distributed with mean 0 and variance 1/2 and 
let Z(Xm — Cm) and Z(Xm + Cm) be independently distributed. Thus {Z,,} is 
a sequence of independent normal random variables with mean 0 and variance 1 
and Z,, and X,,, are independent if m = m’. It is clear that (B1), (B2), (B5), 
(B6), (B7), and (B8) with p = 1 are all satisfied. Suppose that {g,} is a sequence 
of real numbers such that g,X,4; converges in distribution to the normal dis- 
tribution with mean 0 and variance v with v 2 0. Since | g, | Xns1 is then also 
asymptotically normal with mean 0 and variance v we can assume to begin with 
that, for all n, g, = 0. We will show that lim sup,., ng, = 0. 

Let dim , 2m, °** » 5m be random variables taking on the values 0 and 1 only, 
with the value 1 being taken on as follows: 


is thew, (Z.4+ea,)6¢ 
if t.—“ 
i * hh~“a 40 e8. +e 
l # X.t+e.< -C 

eo? € Xa—-en < -C S Xn + Ge 


Let No be such that, for all m > No, cn» < C/2 and, in addition, suppose that 
No is large enough so that a,, < 1 for all m > N,—the latter requirement is to 
guarantee that Ba. > Oforalln =m > No. Since for all m > No, > i-1¢im = 1, 
it follows from (4.36) that m > N> implies 


5 
Mn M(Xn = Cy) aa’ M(X,, + Cm) _ z Min Dim 


rl 


= Cotas rt. 2c3.b1m a 60 mX nim 7s (¢. a” Cn) —C* 3m 


— (CC + (Xa + Ca) en - 
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Observe that none of the last three terms is positive. Abbreviating —a,,/c» times 
their sum by G,, we obtain from (4.2) 


37) , -o 2 Y —iv 
(4.3 3 Xaui = — (1 saat Ba) m + 2nOnlin + Gin — AnCm Ba 


Iterating (4.37) we obtain, forn > N 2 No, 


asd — Byn X wit + 2 a mC Bun Pim + Zz Bun Ga ™~ 7. Am Cm Bmn Zn 


(4.38) N+1 N+1 N41 
= Byn Xwii + Ginn + Goan + Gann 


where Ginw , Geaw , and G3,y are abbreviations for the terms in the corresponding 
positions in the previous line and we note that Ge»y is never negative and that 
Gzan is normally distributed with mean 0 and variance Dore AnCm Brn . 

We will show that lim sup,., n“g, = 0 by contradicting the assumption 
that there exists a positive constant Ds and a subsequence {m} such that 
ni gn, = Drs for all k. We may assume that {n,} consists of all the positive 
integers since the argument — remains valid if we begin by restricting our- 
selves to the subsequence {n,} for which nz"°g,, = Dys . Let 


Han wo (or +1 OnCm Bun) 7 and Gas — 2 230% +1 AnCnB mn 


We will arrive at the contradiction by showing first that the asymptotic nor- 
mality of gnXn41 implies that g,Hnx, = O(1) asn — © and lim supy., gaGaw = 
o(1) as N — o, and then showing (see (4.45) et seq) the impossibility of having 
simultaneously Hx, = O(n’) and lim sup,.,, n’ ‘Gi.w = o(l)asN — ~. 

Let E,, be the set { | X;| S C/2,j7 > m}.SinceX,,— 0 w.p.1,1 — P {E,,.} = 
ém 2 0asm— ~. Since c, < C/2 for all m > No we have, for all such m, 


(4.39) En C {¢d1; = 1,7 > m}. 


For v = 0 let F, denote the normal distribution with mean 0 and variance v. We 
consider two cases according asv = 0 orv > 0. 

Case 1:v = 0. To begin with we obtain from (4.38), the fact that Ginw + Gory 
is never negative, and the independence of Xy4; and Gay that, for alln > No, 


P{Xasr > O} = P{BwonXvos1 + Ginny + Goan, + Gann, > 0} 
{BxonXxoit + Ginny > 0} 

P{Xwe11 > 0, Gan, > 0} 

4P{Xwo41 > 0} 


IV 


(4.40) 


IV 


We will show that for some N lim,.,, gnGaw = 0. Since Gay is decreasing in N 
this will imply limy.,, lim sup,., QnGan = 0. 

Suppose that for all N lim sup... 9,Gaw > 0. Then, for each N, there exists 
a positive constant Djs and a sequence {n,} such that, for all k, gn, Gayw > Dis. 
Let Gry = gnGsaw . Then, since g,Xn41 — 0 in probability we have, by (4.38), 


ASYMPTOTIC DISTRIBUTION 


(4.39) and (4.40), 


0 = 1 — Fy(Dy.) = lim P{ Qn, Bwng Xnai + Cow + 9n; Gingw 


a kw 


- Jny Goan > Dig; Ey} 


= lim >! On, Bnng Xn oe Gos > 0; Ey} 


kw 


im P(X 5.1 > 0, Gt. > 0} — & 


a) 
> 4P{Xy,41 > O} — ex 


Since N can be chosen large enough so that the right-hand side of (4.41) is 
strictly positive we have a contradiction, thus proving that g,Gay = o(1) for 
some N. 
mT —1 f =| 
l'o show that g,47;¥, = O(1) assume, to the contrary, that gn,Ha,.w, — © for 
some sequence {nm}. Since Haw, Gann, is normally distributed with mean 0 and 
variance 1 we would have, for any y, 
: Y* 
lim P{G,,., > y} = 


k+2 
Hence 


> lim P{ gn, Beng Xnert + Gay, > y} 


kon 


> lim P{G3.x, > y}P{Xwni1 > 0} 


= $P\ X41 > 0} 


which is a contradiction, thus proving that g,.Hax, = O(1). 
Case 2: v > 0. The argument used in Case 1 to show that g,Huy, = O(1) can 
be used here with (4.42) becoming 


1 — F,(y) = 4P{Xw,4: > 0} 
Let y — ~ and we obtain a contradiction to the assumption that g,Hzy, is not 
O(1). 
To show that lim sup,.,, gn Gaw = 0(1) suppose, to the contrary, there exists 


a sequence {N;}, a positive number D,; , and a sequence {n,;} such that, for all 
. >? a} . . 
J, Qn; Gnajx; > Diz. Let T be a random variable independent of 


{[Gaan,n > N,N > 0} 


and having F, as its distribution function. Then 


(4.43) sup | P{gnXnaui < y} — P{T < y}| < 
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r . nf 
where /, — 0 asn — ©. Then, letting uj; = gny; Byjnz; 9x; » We have 


1 — F,(Dy) 2 lim sup P{w; gw, Xwj41 + Crum > 0; Ex;} 


k-+>00 


. - * 
; = lim sup P{ uss gv; Xwj41 + Gayjyw; > Of — en, 
(4.44) k--e0 
: \* 
= lim sup P{w;T + Gr,,~, > 0} — ex, — ts, 
ko ao i 
= 4 —_ Ey; —- ty 


For j large enough it is clear that ey, + ty; < Fe(Di;) — 3 which gives the desired 
contradiction. 
To conclude the argument we have to show that it is impossible to have 
: : 1/3 , 1/3 —1 - . . 
limy.,, lim supy,., 72’ Gav = O and n’” Ayzy, = O(1). If it were possible we 
2 2 
would have, for N = No, 


n 2 n n 
3/2 43/2 2 2 —2,2 
( 7 Om et) S ( > an 23am 7 Gn Cu 3°.) 
m=N+1 m=N+1 m=N+1 


Ul 


(4.45) ‘ “gs 
er < (Do an ch Ban Zz. Gala Aon) 
N+1 Not! 
—1 
< Ean n 


where limy.,, lim sup,.,, €nv = 0. Hence, using Holder’s inequality, 


(4.46) (> am Bn = (> o2i*6u2) (n — N) S eay(n — N)n™ 
N+1 


N+1 
Applying Lemma 1 with W,, = W = 1 we conclude that, for each N > No, 


1 S lim sup én. 
n+ 
which is impossible. 

The reason that we cannot have n'°X, asymptotically normal in Example 2 
is clearly the upsetting character of G,,. . The following example shows how we 
can obtain n'*, and even better, by considering the asymptotic behavior of 
{Xaul — Gio} instead of {X,}. What this indicates is that the ‘“‘bias” term, 
Gino, around which X, becomes rapidly concentrated, is the dominant error. Of 
course, the improvement in the order of convergence is of little practical use 
since it is X, — @ which matters. 

EXAMPLE 3. Let M be as in Example 2 and let Z(xr) satisfy (B5) and (B6). 
Note that M satisfies (B8) with p = 1. Let a, = An for A > 1 and let c, = 
n~"'® d, where d,, satisfies the conditions of Lemma 4 and d, — 0. We will show 
that n"c,(Xni1 — Gro) is asymptotically normal with mean 0 and variance 
o A*(2A — 3/4). By (4.36) and the kind of argument used several times be- 
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fore we will succeed in doing so if we can show 


(4.41) hn >, OmC,8mn(1 — bm) = 0,(1) 


N+1 


n 
(4.42) hy, 7 Am Bun Xm _ 0,(1) 


N+1 


where ¢m is 1 or O according as |X,|/S C — c, or not, and where 


Ban = | [foms1 (1 — a;). By use of Chebyshev’s inequality (4.41) and (4.42) will 
be proved if we show 


(4.43 hn >, OmBmn EX?, = 0(1). 


N+1 


Using (4.31) and Lemma 4 (note that A > 1) we obtain 


h,, : m ‘Bm, EX:, = O (1 a m'c* An + O @ 7 m ‘ex Boa 
1 1 


(4.44 N+1 
= O(n'*ch) + O(n™c;') = o(1, 


° ’ ‘ + 3 ° 
which proves (4.43). Note that we can do better than n‘” and, in fact, we can get 
. . 3 a 
arbitrarily close to n 
Blum in [3] has suggested a procedure which replaces (4.1) by 


Xazi = Xa — AnCn [Y(Xa) — Y(Xn + Cp). 


This was suggested mainly for the multi-dimensional case which we consider in 
the next section but we point out here, in Example 4, that this procedure can be 
rather inefficient. 

EXxAMPLe 4. Let M(x) = —2°/2 and let Z be as in Example 1. Then, using the 
Blum procedure, 


n 


‘ > ] - aa ; 
Ross Bor X, a > GaCuBna — X Qn Cm Om, Le 


a | 


We show that if h,X, = O,(1) then h;’ cannot be o(n~*). Theorem 2 shows. 
of course, that, for the Kiefer-Wolfowitz procedure, h;’ can be almost O(n™"”) 
If h,’ = o(n™™) and X, < 0 we would have 


; —1/4 
z Guta Ban = o(fn ”) 
1 


2 —2,2 —1/2 
a AmCm Bmn = O(n ”) 


Hence 


n 


' 2/3 sn 1/3 
4/3 94/3 2 2,2 —1/3 
Ga One & (> Am én Br (x Gu Cu fin) = o(n '*) 
1 1 1 
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But then 


n n 3,4 
4/3 24/3 1/4 
2 Am Bein Ss ( Am | zn) nm = o(1) 
1 1 


which is impossible by Lemma 1. 

Again it is the “‘bias’”’ term 7. QmCm Bmn Which is the dominant error, i.e., the 
Blum procedure becomes rapidly concentrated about the wrong value just as 
in Example 3. 


5. Multi-Dimensional Procedures. In this section we consider multi-dimen- 
sional analogues of the Robbins-Monro and Kiefer-Wolfowitz procedures. Since 
the theorems and proofs for the multi-dimensional case are quite similar to those 
for the one dimensional case considered in Sections 3 and 4 we will not go into 
great detail in this section. We first consider a qg-dimensional analogue of the 
Robbins-Monro procedure identical with the one considered by Blum [3]. The 
q-dimensional analogue of the Kiefer-Wolfowitz procedure considered next differs 
somewhat from the procedure given by Blum—the differences are pointed out 
below. At the end of the section we remark on some more general q-dimensional 
analogues. 

Let x be a q-vector and let M be a vector-valued function of x with M(z) also 
being a g-vector. Let a be a vector and let 6 be a solution of the equation M(x) = 
a. Let Y(x) be a vector random variable with EY(x) = M(x). The Robbins- 
Monro procedure for “locating” @ is given as follows. 

Let {a,} be a sequence of positive real numbers such that 


(5.0) > an = &, «<< * 


Let X, be an arbitrary vector (as in Section 3 X; can actually be taken to be a 
random variable) and define {X, ,n = 2} by the recursion 


(5.1) Xnui = Xn — a,(¥(X,) — a) 

where Y(X,) is a random variable whose conditional distribution given X; = 
%1,°°*,Xn = Xn is the same as Y(z,). Writing Y(x) = M(x) + Z(x) we obtain 
from (5.1) 

(5.2) Xnai = Xn — On(M(X,) — a) — anZ(Xn) 

where, as before, the conditional distribution of Z(X,) given XY; = 1, ---, 
X, = 2, is the same as the distribution of Z(z,) and 

(5.3) E(Z(X,.) | X1, °°: ,; Xa) = 0 

w.p.1. 


The assumptions we make now are easily seen to correspond to the assump- 
tions made in Section 3—(A2*) corresponding, of course to (A2’). The notation 
we use is the same as that adopted in Section 2 for Lemma 6. 

AssuMPTION (A1*). M is Borel-measurable, M(6@) = a, and, for every « > 0 


inf lx — 6, M(x) — al > O 


1/e>|z—O0|><€ 
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(Al*) is satisfied, for example, if (A3*) is satisfied with 6 = 0 which is, of 
course, much stronger than needed. 

ASSUMPTION (A2*). There exists a positive constant K, such that, for all z, 

| M(x) —a| S K,|2 — 6| 
AssuMPTION (A3*). For all x 
M(x) = a+ B(x — 6) + Oz, 0) 

where B is a positive definite g X gq matrix and | 6(z, 6)| = o(|z — 6|) as 
z—-@0—0. 

AssuMPTION (A4*). 


(5.4) sup E| Z(z) |? < « 
(5.5) lim EZ(x)Z'(x) = x 
z+ 


where =z is a non-negative definite matrix and where the limit is in the sense of 
the norm we have defined. 
AssumpTION (A5*). 


f e 
lim lim sup | | Z(x) |"dP = 0 
Row 00% |z—Oi <e “(| Z(z)[>R} 
The remarks concerning Assumption (A5) in Section 3 also pertain here—we 
use (A5*) in conjunction with the convergence of X, to @ w.p.1 (a consequence of 
(A1*), (A2*) and (5.4)) and (5.4) only to obtain 


(5.6) lim sup [ | Z(X,) |? dP = 0 


Reo k {| 2(X_)|>R} 


As before, with Z(x) considered as a vector, (3.5) and (3.6) imply (A5*). 

Let b, , --- , bg denote the eigenvalues of B in decreasing order. Write B = 
PDP™ where P is orthogonal and D is the diagonal matrix whose diagonal ele- 
ments are b, , --- , b, . Observe that inf),)-: [Bz, 2] = b,, inf).)-1 [Bz, Br = b3, 
and |! B || = b,. Let x;; be the (7, j)th element of + and let +7; be the (i, j)th 
element of x* = P™'xP. 

THEOREM 5. Suppose that Assumptions (A1*) through (A5*) are satisfied. Let 
a, = An’ where A is such that Ab, > }. Then n‘?(X, — 6) is asymptotically 
normal with mean 0 and covariance matrix PQP™ where Q is the matrix whose 
(t, j)th element is A*(Ab; + Ab; — 1)° rh. 

Proor. Let a = 6 = 0. Let u = P'x, M*(u) = P“M(Pu), 8*(u) = 
P“s(Pu), and Z*(u) = P“'Z(Pu). Then, with x, M, 6, and Z being replaced 
by u, M*, 5*, and Z* respectively, it is easy to see that (A1*) through (A5*) are 
satisfied with B replaced by D and x replaced by x* and that (5.2) is trans- 
formed into another Robbins-Monro procedure with a replaced by P~'a. Thus, 
in order to prove the theorem it is sufficient to prove that, when B is diagonal, 
n'*X,, is asymptotically normal with mean 0 and covariance matrix 


((A?(Ab; + Ab; — 1)7*m;)). 
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(A1*), (A2*), and (5.4) imply that XY, converges to 0 w.p.1 (this follows from 
Dvoretzky’s theorem—Blum’s earlier proof of convergence w.p.1 is under 
stronger assumptions) and, hence, using (A3*), an argument like that in Theorem 
1’ shows that we can add the additional restriction that there exists a positive 
constant K such that AK > 1, K < b,, and, for all z, 


(5.7) [M(z), 2] = K|2|’ 


The proof proceeds now just as in Theorem 1. Iterating (5.2) and using (A3*) 
we obtain 


(5.8) Xen = Ba X1 — AD, mM Banda — AD, Ban Zen 


m=1 m=1 


where 


n n —1/2 

Bmn = [|] (I — Aj*B). Let ha = (= A*m™ || Brn ‘) 
m+1 1 

Since || Bn || = (1 + em) (mn')*°* where em — 0 as m — &, we have h, ~ 

(2Ab, — 1)'?A~’n"?. Making use of (5.7) and the same argument as used to 

obtain (3.18) in Theorem 1 we obtain E | X,,|? = O(m™’). It then follows just 

as in Theorem 1 that 


h, (BX: - A> w Bon in) — 0 


m=1 


in probability. 

To conclude the proof we will apply Lemma 6 with U,, = Ah, k Bin Z;. 
Just as in Theorem 1 we obtain quite readily that (2.19) and (2.20) are satisfied 
with this choice of U,, . Since | Bin 2 | = (1 + &) (kn™)*"' | x | it follows from 
(A5*) in the same way as in Theorem 1 that (2.21) is satisfied. We have only to 
compute lim,.,, 8, to be finished. Let the (i, j)th elements of EZ, Z; ands, be 


denoted by 75}’ and s‘? respectively. Let Bun = [fu (1 — Ab; 7’). Then 


n 

2,2 -? (k 

sii = Ath >, k *Binn Bian BO 
k=l 


Since 2{) — x;; and h?, ~ (2Ab, — 1) An it follows that 
sy — (2Ab, — 1) (Ab; + Ab; — 1) mi; 


Thus, when B is diagonal, n’*X,, is asymptotically normal with mean 0 and 
covariance matrix ((A°(Ab; + Ab; — 1)'m;)), and this finishes the proof of 
the theorem. 

We will now take up the multi-dimensional Kiefer-Wolfowitz procedure. 
Let x be a g-vector and let f be a real valued function of x. Let y(x) be a real 
random variable with Ey(x) = f(x). We will consider the following g-dimen- 
sional version of the Kiefer-Wolfowitz procedure for finding the point at which 
f has a maximum. 
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Let {a,} and {c,} be two sequences of positive real numbers satisfying 


(5.9) > an. = &, > ance < @, limc, = 0 

For 1 S i S q let e; be the g-vector whose ith coordinate is 1 and whose other 
coordinates are 0. Let Y(z, a) = (y(x + ae), --- , y(x + ae,)). Let X; be an 
arbitrary qg-vector and define {X,,n = 2} by the recursion 


(5.10) Xa41 — Xn sn AnCa. (Y(X, ’ —Cn) — Y(X,, ’ Cn)) 


where the conditional distribution of Y(X,, + c,) given Xi; = 1,-°-:, Xx 
Z, is the same as Y(z,, + c,). Writing y(z) = f(x) + z(zx), and letting 


M(x, a) = (f(z + ae), --- , f(z + ae,)), 
Z(z,a) = (2(x + ae), --- , (2(x + ae,)), 


we rewrite (5.10) and obtain 
(5.11) Xasi = Xn — GnCa (M(Xn, —Cn) — M(Xz, €n)) 
_ an,cn (Z(Xa, ey) Z(Xn ’ Cn)) 


We will denote M(X, , —cn) — M(X,, ca) by M, and Z(X,, —¢n) — Z(Xn, Cn) 
by Z, . It is clear that just as in Section 4 

(5.12) E(Zasi|Z1,°°* , Zn) = E(Znar| X1,°°° , Xau1) = 0 

w.p.1. 

The procedure we have defined by (5.10) differs from the one considered by 
Blum [3] in that Blum uses Y(X,,0) — Y(X, ,c,) rather than Y(X,, — cn) — 
Y(X,, Ca). The advantage of the Blum procedure is that it requires at each 
stage g + 1 observations whereas the number of observations required by (5.10) 
at each stage is 2g. However, as noted in Example 4, the Blum procedure is 
quite inefficient with respect to the rate at which it converges to @. 

We now list the assumptions we require. The correspondence between these 
assumptions and those of Section 4 is easy to see. 

AssumPTION (B1*). f is Borel-measurable, has a unique maximum at z = 8, 
\f(z + 1) — f(x) | Ss D, + D,| z| for some positive constants D; and D2, and, 
forr0 <a <a<aea< @, 

inf e [M(z, —e) — M(z, 6-), xz — 6) >0 


€,5|2-O0|<ee 
0<eSeg 


(B1*) is satisfied, for example, if (B2*) is satisfied with 6 = 0; of course, this 
is much stronger than is needed. 
AssumpTION (B2*). For all z 


f(x) = a — [B(x — 6), x — 6] + A(z, 8) 


where ap is real, B is a positive definite q X q matrix, and 6(z, @) = o( | z — @|*) 
asz — 6—0. 
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AssuMPTION (B3*). There exist positive numbers K,, Az, and co such that, 
for all x in some neighborhood of @ and all c with0O <c S @, 


K,|x — 0|’ S [x — 6, (M(x, —c) — M(x, c))/c] S$ K2| 2 — 0 


9 


and, for all z, 


[M(z, ~e) — Mia, 0) 


<s K; wie 


AssuMPTION (B4*). If co > 0 then, for all z and c such that |x| <¢ < & 
(6(x, —c, 6) — 8(x, +c, 6))/e = o( |x — 6] ) 
AssuMPTION (B5*). 
(5.13) sup E| Z(x, 0) |? < 
(5.14) lim E(Z(x, —c) — Z(x, c))(Z(x, —c) — Z(x,c))’ = x 


z+6 
c+0 


where z is a non-negative definite matrix. 
ASSUMPTION (B6*). 
lim lim = sup | Z(x, 0)? dP =0 


| 
Rom €20* [zl <e ~{|Z(z,0)|>R} 


As before we use (B6*) to obtain 


(5.15) lim sup \Z,./°>dP =0 


i 
R>o k “{|Z,g/>R} 


and, as before, (B6*) is implied by (3.5) or (3.6) with Z(x) considered as a vector 
of course. 

AssuMPpTION (B7*). There exist positive numbers e, co , and K, such that, for 
all c S co and all z satisfyingc < |x — 6| <.«, 


[x - 6, (M(x, —c) — M(z,c))/c] > Ki|x — 0|’ 
Let 6(z, 0) = «|x — |’ 


AssumpTion (B8*). There exist positive numbers ¢, p, and R such that for 

alle S co 
sup e, S Re’ 
|z—@| <c 

As in the paragraph preceding Theorem 5 let B = PDP™ and let ((xij)) = 
a* = P'sP. 

THEOREM 6. Suppose Assumptions (B1*) through (B6*) are satisfied. Let 
AK, > 1/2 and choosea, = An’. Let {c,} bea sequence of positive numbers satisfy- 
ing (5.9) with a, = An” and the assumptions of Lemma 5 with r = 0. Then 
n''"c,(X, — 0) is asymptotically normal with mean 0 and covariance matrix PQP™ 
where Q = ((A?(4Ab; + 4Ab; — 1)" 435)). 
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Proor. Let ao = 6 = 0. (B1*) and (5.13) imply that X, converges to 0 w.p.1 
(this is a consequence of Dvoretzky’s theorem [7]). Hence, an argument like that 
in Theorem 1’ shows that (B3*) can be strengthened so that it holds for all z. 
Rewriting (5.11) by using (B2*) and letting a = 4A we obtain 
(5.16) Xavi = (I — an™'B)X, — An'c,'6, — An™'c,'2Z 


an 


Let Ban = []a4:1 (7 — aj 'B). Then, iteration of (5.16) yields 

(5.17)  Xnagi = Ban Xi -— A > mM 'Cn' Bm 5m — A > m ‘Cn Ban Dm 
It is easy to verify that 

(5.18) Bas || = || P Ban P || = II (I — aj 'D) | wm ien-ee 


m+1 


Also, letting rm = EZmZm, Dnn = [] nai (I — aj” D), and 


n 
2 9 —2 —2 a(bs+b —« (t b;) _* 
Qn = (4 NC 7 awe CS r)), 
m=| 


and using (5.14), an argument like that in Lemma 3, and Lemma 4, observe that 


lim A’nc., > m~*c7’ Bn Tm Bas Pa PQP™ \ 
i 


st ) m= 
= lim A’ncs > men Dmn P~ 4m PDan — Q 
(5.19) eed. sia | 
= lim | A’nc, >> men Dan F*Dnn — Q | 
no || m=1 j 
= lim ||Q, — Q|| = 0 


noo 


To prove Theorem 6 we now proceed as in the proof of Theorem 2 and show 


- 1/2 
(5.20a) n'CaBon — 0 
n 
~ ¢ 1/2 —1 —1 . °° 
(5.20b) nm” Ca > m Cm Bmn dm > 0 in probability 
m=! 
n 
(= + 1/2 —1 —1 r . ‘ ° is 7 
(5.20c) Ane, m Cm Bm, Zm is asymptotically normal with mean 0 and 


m= 


. . r 1 
covariance matrix PQP . 


(5.20a) and (5.20b) are proved in the same way that (4.15a) and (4.15b) in 
Theorem 2 are proved; the only way that B,,, need enter in this parallel proof 
is through its norm which we have calculated in (5.18). To show (5.20c) let 
U.. = An’c,k ce’ BinZ, and observe that by (B5*), (B6*), and (5.18) all the 
conditions of Lemma 6 are satisfied with s = PQP™'. This completes the proof 
of Theorem 6. 
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THEOREM 7. Suppose (B1*), (B2*), (B5*), (B6*), and (B7*) are satisfied with 
K, < 4b, in (B7*). Let c, = n~* and a, = An™ where A is such that AK, > 1/4 
Then n'*(X, — 0) is asymptotically normal with mean 0 and covariance matrix 
PQP™ 

We omit the proofs of Theorem 7 and Theorem 8 below since they proceed 
from the proofs of Theorems 3 and 4 in the same fashion that the proof of Theo- 
rem 6 did from that of Theorem 2. 

THEeorEM 8. Suppose that (B1*), (B2*), and (B5*) through (B8*) are satisfied 
with K, S 4b, in (B7*). Let a, = An™ where AK, > 1 and let {cn} satisfy the 
conditions of Lemma 5 with d, — 0 and with r = (4 + 2p). Then n'c,(X, — 6 
is asymptotically normal with mean 0 and covariance matrix PQP ' 

The procedures given by (5.1) and (5.10) can be generalized if we replace {a,.} 
by a sequence {7',} of matrices. When {7,,} is a sequence of positive definite 
matrices such that, for all n, B and T, are diagonalized by the same orthogonal 
matrix P, and when the smallest and largest eigenvalues of 7, , denoted by t* 
and ¢;* respectively, satisfy (5.0) and (5.9) with a, replaced by ¢% and ¢3*, 
methods like those used in the earlier part of this section and in earlier sections 
can be used to study the asymptotic behavior of these procedures. Indeed, if 
T, = n” T where T is a positive definite matrix which is diagonalized by ?, 
results like those proved in the earlier part of this section can be obtained by 
using the same methods as used in obtaining these results. When, for A > 0, 
T = AI, we are in the situation covered by those theorems. In studying (5.10 
(the Kiefer-Wolfowitz procedure) we can, in addition, replace {c,} by a sequence 
{C,} of matrices; the remarks about {7',} are also relevant to {C,}. 

Since Examples 1 and 2 of Section 4 can be extended to their g-dimensional 
analogues we cannot hope to improve materially the results of Theorems 5, 6, 7, 
and 8 by using sequences {7’,} which satisfy the second sentence of the preceding 
paragraph and which are more general than {a,J}. However, if we knew B and 
m, then, by suitable choice of such {7,,} we can, in general, obtain a limiting 
covariance matrix of smaller size than is obtainable by using merely {a,J}. As 
an indication of this suppose that we are concerned with a two-dimensional 
Robbins-Monro procedure satisfying Assumptions (A1l*) through (A5*) with 


r= c >) and B= C > 
a7 t; O 
“NO t 


Letting 


where both 4); and fab. are larger than 1/2, we compute the limiting covariance 
matrix to be 


| ti ‘ 
Ne tale ce ee ee 
6) Swe Tl @ = 7 'reys | OY 


9 
m==1 p=m+1 2te 


0 ; 
(2to bo 
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Choosing 4; = 1/b; and tg = 1/b: will minimize the entries in the matrix in 
fw ¢ vr “ je . —i pn © =f 
(5.21). Thus, if b; # bo we can do better by using {nT} than by using {An} 
since using {An} would correspond to the case where hh = fe. 


6. Concluding Remarks. In Sections 3, 4, and 5 we have restricted ourselves 
to sequences {a,} of the type a, = An’. It is clear that arguments like the ones 
presented above can be given for cases where a, is chosen to be something other 
than An” e.g., a, = An‘. Due to Examples 1 and 2 however, the results of the 
previous sections are not likely to be improved very much by using these dif- 
ferent sequences. Indeed, for the Robbins-Monro procedure it was shown in [5], 
Section 7 that under some restrictions, the Robbins-Monro procedure with 
a, = An for a certain choice of A is optimal in the sense that it is asymptot- 
ically minimax for many weight functions. We may remark that this optimum 
property can be extended with no difficulty to the multi-dimensional Robbins- 
Monro procedure. 


In [4] Burkholder considers somewhat more general processes than considered 
here in the sense that he permits M(X,) and Z(X,) to depend on n as well as 
X, . With some modifications of the assumptions we have made this situation can 
be treated using the methods of Sections 3 and 4. Procedures given by Burk- 
holder for locating points of inflection of a regression function and for finding 
the maximum of a density function can also be treated using our methods. 

It is sometimes of interest to study the asymptotic behavior of M(X,) — a 
for the Robbins-Monro procedure and of M(X,) — ap for the Kiefer-Wolfowitz 
procedure. It is easy to see that results about the asymptotic distribution of these 
quantities can be obtained from the results about the asymptotic distribution of 
X, — 0. 
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ADMISSIBILITY FOR ESTIMATION WITH QUADRATIC LOSS' 


By SaMvuEL KARLIN 

Stanford University 
0. Introduction. In dealing with estimation of a single unknown parameter, 
the criteria most commonly employed in evaluating the worth of given estimates 
is to make comparisons of the expected square deviation of the estimates from 
the true value. Suppose on the basis of an observation x (or series of observations) 
on a distribution P(x, w) of the form f7. p(é, w) du(é) depending on an unknown 
parameter w it is desired to estimate some function h(w). The quantity p(x, w) 
may be regarded as the density of P(x, w) with respect to the completely additive 
measure uw. A non-randomized estimate of h(w) is described by a function of the 
observations a(x), and when the error of an estimate is evaluated in terms of 
quadratic loss, the expected risk for the estimate a(x) when the true parameter 

value is w is calculated by means of the formula 


(1) p(w, a) = | (a(x) — h(w))* p(x, w) d(x). 


The object is to select the estimate a which minimizes (1) in some sense. The fact 
that the statistician may restrict attention only to non-randomized estimates is 
due to the convexity property of the loss function ({1], p. 294; [2], p. 4.3). The 
justification of the quadratic loss as a measure of the discrepancy of an estimate 
derives from the following two characteristics: (i) in the case where the a(z) 
represents an unbiased estimate of h(w), (1) may be interpreted as the variance 
of a(x) and, of course, fluctuation as measured by variance is very traditional in 
the domain of classical estimation; (ii) from a technical and mathematical view- 
point square error lends itself most easily to manipulation and computations. 

Principles used to determine a particular estimate which accomplishes appro- 
priate optimizations are related to the minimax criteria, Bayes procedures, 
unbiased uniformly minimum variance estimates, etc. However, one prerequisite 
universally acceptable as desirable for statistical procedures is the property of 
admissibility. An estimate a is said to be admissible if there exists no other esti- 
mate a* such that p(w, a*) S p(w, a) with inequality for some w. In other words, 
an estimating procedure is admissible if it cannot be uniformly improved upon 
in terms of risk by any other procedure. Certainly, no estimate should be used 
if we can do better by a different estimate—whatever the true state of nature. 
It would, therefore, be of considerable interest to establish the admissibility of 
some of the standard estimates employed in practice. 
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A more ambitious undertaking would be to try to characterize all possible ad- 
missible estimates for the case of square error. This appears to be an almost in- 
surmountable task. On the other hand, it is relatively easy to determine com- 
plete classes of procedures for many parametric problems. In fact, whenever the 
density p(x, w) possesses a monotone likelihood ratio, all possible monotone 
functions a(x) constitute an essentially complete class of estimating procedures 
[3]. Nevertheless, for any multi-action problem, which includes in particular 
estimation, it is known that many of the members of a complete class need not 
be admissible [3], [4]. Furthermore, we have found that admissibility is tied 
very closely to the order of growth of the loss functions. Square error falls into 
the category which admits meny monotone inadmissible estimates. For absolute 
error, in contrast, the likelihood that one of the usual estimates is admissible 
seems to be greater. 

Since the general question of resolving admissibility of all estimates measured 
with respect to quadratic loss function is intrinsically difficult, it seems worth 
while to concentrate on the investigation of whether some of the most commonly 
employed classical estimates are admissible. 

In this paper we study the problem of admissibility of the usual estimates for 
three important classes of distributions. 

The first class of distributions comprises the exponential family where p(z, w) = 
B(w)e**. The family a,(x) = yz is considered as possible estimators for h(w) = 
—B’(w) 

B(w) 
based on several observations coming from an exponential distribution. The 
problem examined in general is whether yx is an admissible estimate of E(x) 
measured in terms of quadratic loss. The parameter w is taken to vary over its 
natural range © consisting of all w for which f e** du(z) < . It is well known 
that the natural range @ is an interval which may be finite or infinite. In the case 
where 2 = (— ~~, ~), it has been shown that a;(x) = z is admissible (see [4] 
and [5}). The method of proof in both references rests heavily upon the use of 
the Cramér-Rao inequality and associated differential inequalities. The fact that 
zx is an unbiased estimate of F(x) seems also to play a fundamental role in this 
proof. It seems difficult to perceive the meaning behind the analysis and the 
reasons why things work. In Section 1 we develop a direct proof of this fact. 
Our methods yield the further interesting and possibly surprising result that 
yz for any y satisfying 0 < y S 1 is an admissible estimate of E(x) whenever 
ue possesses positive measure in the regions x = Oand x S OandQ = (—-, o~), 
On the other hand, for any y > 1, yz is not admissible. In view of the fact that 
any contraction of z (yz, 0 < y S 1) is admissible it seems surprising that in 
practice one always uses the extreme estimate of this kind. The criteria of un- 
biasedness traditionally has dominated the choice of an estimate. Yet we find 
in several types of estimation problems that this feature of biasing the estimate 
by scaling it downward is necessary to achieve admissibility. We shall elaborate 
later on this phenomenon. 


= E.(x) = B(w)f xe’ du(x). Usually x represents a sufficient statistic 
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If the natural range 2 of is not the full infinite interval, then the full deter 
mination of the problem of admissibility of yx appears to be complicated. For 
the special case where 


“4 x a “ 
edhe r= 0, 
PAX,w) = 4 Tia) 
0, <= G& 
for which 2 = (0, ~) we find that of all estimates of the torm yz there exists a 
single admissible member in this class, namely y = [a/(a + 1)], which is a 
. ° + Qa ‘ 
biased estimate of E(x) = — (see [4]). 
- 


When w naturally ranges over a finite interval, the problem of admissibility is 
even more difficult. The analysis seems to depend on the rate at which 8(w) tends 
to zero as w approaches its boundary. For example, it is shown later that, if 
p(x, w) = B(w)e™“*(e '*'/2) for which 2 = (—1, 1) and B(w) = 1 — w, then 
all estimates yx (0 < y S 4) are admissible estimates of F.(x) while for any other 
y > 4,yx2 may be uniformly improved upon in terms of risk. In general, the 
possible values of y for which yz is admissible appears to be very sensitive to 
the explicit measure du(x) of the exponential family and generally consists of a 
subinterval of the unit interval. 

The following general result concerning admissibility of yx is the assertion of 
Theorem 1 of Section 1: if 8~*(w) is not integrable in the neighborhood of both 
boundaries of 2, then [1/(A + 1)]z is an admissible estimate of E.(z). This 
includes as special cases all previously known results in this direction. 

Admissibility is next investigated for the class of distributions where 


5 


} q(w)r(x), 07 Sa, 


D(x, w) = > and du(x) = dx. 


{ 90, xr>worxr< 0, 
ris a positive function of z and q(w) represents a normalizing constant. This 
includes, in particular, extremal distributions arising from the uniform density; 
e.g., r(x) = nx", n = 1, and g(w) = 1/w”. We assume in what follows that 
r(x) is such that the integral ff r(x) dx diverges. This requires that the normaliz- 
ing factor g(w) approach zero as w increases to infinity. In dealing with the es- 
timation problem it is convenient to consider estimates of 1/[g“(w)], a > 0, 
a strictly monotone increasing function of w. Again we limit attention to esti- 
mates which are functions of a single observation x. This in fact is justifiable in 
every sense whenever the observation x summarizes a sufficient statistic. For 
example, if 2,,---, 2%, represent independent observations from a uniform 
density spread on the interval (0, w), then max;<,<,(2,) = y possesses a density 
of the form described above, where r(y) = ny”, and the justification of basing 
estimates of w solely on y is manifestly clear. 

Although an unbiased estimate of h(w) = 1/2q(w) is a(x) = 1/q(x), the 
only admissible estimate of the form y[1/2q(x)], y a constant, is obtained for the 
unique value y = %. Thus, the characteristic phenomenon appears once again 
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to the effect that admissible estimates are obtained provided the estimate is 
biased by scaling downward. The same is true when treating the problem of 
estimating the function h(w) = [1/q(w)}* with a > 0. Analogous results are 
also valid for the class of distributions 


{q(w)r(2), c> 


P\2, o) = 0. — 


of whicl 


é 


(z, w) = < 
P(x, w) 0, 


is a typical example. 

A possible source of explanation for the excessive uses of the principle of un- 
biasedness as a basis for selecting one estimate in preference to another may be 
due to the following considerations: First, a familiar theorem due to Blackwell 
states that within the collection of all unbiased estimates there exists a uniformly 
minimum variance unbiased estimate [6]. [This is the case if the family of densi- 
ties generated by the various parameters is large enough in the sense of forming 
a “complete family’ ({2], p. 3.6.)] This certainly lends importance and some 
cognizance to the consideration of unbiased estimates. Second, in considering 
asymptotic or large sample theory, it is found that consistent estimators are, 
for large sample size, nearly unbiased. For these two reasons, a tradition de- 
manding an estimate be unbiased regardless of the sample size has become 
acceptable practice. From the point of view of admissibility this is almost uni- 
versally the wrong estimate to use. We find the desire and need to bias an esti- 
mate to insure admissibility. 

The third group of distributions studied from the point of view of estimation 
is related to the important translation parameter problem. The underlying den- 
sity is assumed knownexcept fora location parameter; that is, p(z,w) = p(x — w) 
and we wish to estimate w. In order for the problem to possess the proper in- 
variance structure’ we further suppose that du(r) = dz, f tp(t) dé = 0, and 
f @p(é) dt < «x. Consequently, we readily observe that for the case of a single 
observation, x is an unbiased estimate of w. In the present context the relevance 
and justification of using the estimate rests primarily on its characteristics of 
invariance with respect to translations and only incidentally on the property of 
unbiasedness. With further slight conditions we establish that z is an admissible 
estimate of w. 

For the situation of several independent observations 2, %2,-+--, 2, the 
minimum variance invariant estimate is the familiar Pitman estimate 


| ev@pte: — 1 + O)p(as — 2 + 6)---p(z, — x1 + 8) dO 





wy 


[ p(@)p(xs — 21 + 0)p(ze — 21 + 8)---plz, — 21 + 0) dO 
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which represents the multi-observation analogue of the estimate x [5]. If the 
density p() is assumed to possess a sufficient number of moments, the expres- 
sion for a*(x) is well-defined. Again, subject to sufficient smoothness require- 
ments, we will show that a*(x) is an admissible estimate of w. A special case of 
this result where the parameter and observation both traverse the set of integers 
was discussed by Blackwell [7]. He demonstrated in this case that a*(x) is 
admissible whenever p(&) vanishes outside a finite interval. He also showed 
that without some limitations on the nature of the density p the admissibility 
of a*(x) is not generally valid. In connection with the translation parameter 
problem, a notion of local admissibility is also examined; this notion may possess 
a greater degree of applicability than indicated in the present context. 

The method of analysis in all three cases revolves about an inversion process 
which we proceed to explain in formal terms. Suppose it is desired to establish 
that a(x) is an admissible estimate of h(w) with respect to the loss function 
measured by square deviation. Assume the contrary that b(x) is an estimating 
procedure which improves upon a(x). This states that the inequality 


| [b(x) — h(w) p(x, w) du(x) S | tate) _ h(w) Pp(a, w) du(x) 


must be true for all w. Therefore 
(3) / bis) ~ asia add) 62 / ls) — Ms) — Medinls, w) del 


also holds for all w. In order to demonstrate that a(x) is admissible, it is enough 
to show that the truth of (3) is only possible provided b(x) = a(x) almost every- 
where with respect to u. Suppose it is possible to construct a monotone increasing 
function F(w), not necessarily bounded, with the property that 


[ h(w)pCx, «) dF(w) = ala) f plz, w) dF). 


Provided that all operations performed are legitimate, it follows that after in- 
tegrating (3) with respect to dF and interchanging the order of integration 


[ w@ — a(x)} il p(x, w) ars) | du(x) < 0. 


This implies, essentially, the desired result. Throughout what follows, we de- 
velop sufficient machinery to justify this formalism. The method may be applied 
to numerous other kinds of admissibility questions which are not studied in the 
present paper. 

This formalism can also be related to the concept of the optimal Bayes pro- 
cedure. If F(w) represents a bona fide distribution and our objective is to obtain 
the Bayes estimate of h(w) with respect to quadratic loss for F(w), then it is a 
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known fact that the best estimate is given by the expression 


[ h(w) p(x, w) dF (w) 


a(x) = 


/ p(x, w) dF (w) 


see [1], p. 299). 

Unfortunately, in all cases we are concerned with the relevant F(w) turns out 
to be a non-finite measure. One could then alternatively try to approach F(w) 
by a sequence of distributions such that the corresponding estimates converge to 
the desired a(x). Such a method of analysis for admissibility was proposed and 
exploited by Lehmann and Blyth ((2], Section 4.4; [8]). The present results might 
be viewed as a refinement of this idea. 

The extensions of these results and method to the analogous sequential estima- 
tion problem will be published subsequently. 

Finally, we wish to express our thanks to Mr. Rupert Miller for his help in the 
writing of this manuscript. 


1. Exponential family. In this section the random variable X will be assumed 
to be distributed according to the probability density dF.(z) = 8B(w)e”* du(z). 
u is a o-finite measure defined on the real line, and w, the unknown state of nature, 
belongs to the set 2 = {w| f*. e** du(x) < «} which is an interval of the real 
line. Let &@ and w be the upper and lower endpoints of 2, respectively. & and & 
may or may not belong to 2, and in some cases 2 = +2*,w = — ©. The prob- 
lem for consideration is the estimation of the quantity (wv) = E.(x) = —8’(w) , 
8(w) from a single observation x on X. There is no loss of generality in restricting 
our attention to the case of a single observation for, as is well-known, a sufficient 
statistic for n observations from an exponential distribution is the sum of the 
observations whose distribution is also a member of the exponential family 
({1], p. 221 

Admissible estimates of 6(w) will be derived for the different cases depending 
on the structure of 2. We shall consider only classical type estimates of the form 


yc = a,(x) where y is a positive constant. The value y = 1 provides the unique 
unbiased estimate of E.(z) within this family a,(z). 
The only estimate ordinarily considered is a;(x) = z and this appears to be due 


to the influence the concept of unbiasedness has had on statistical theory and 
practice (see our discussion in the introduction). Square error as a measure of 
the value of an estimate has been tacitly associated also with the principle of 
unbiasedness. Nevertheless, we shall find that from the point of view of admis- 
sibility it is frequently preferred to bias the estimate. Hodges and Lehmann [4] 
demonstrated the admissibility of a,(z) = 2 for a few scattered examples. Gir- 
shick and Savage [5] showed that provided Q = (— ~~, ©), x is admissible. Our 
results cover a substantially larger subclass of the full exponential family for the 
whole set of estimates a,(zx). 
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In view of the relations 


3\w) [ atom 
B(a [ xe 


we obtain that 


3(w) | [yar _ 6(w) Pe” dula 


Y aque 


- B(w) 


) l — cee) 7 2D. ip \W lr 
B(w) ' 


For each w in 2 the minimum of the quadratic expression in y is achieved uniquely 
for the value 


l 


4 Bw )° — 8(w)8”(w) 
(B'(w))- 

But (8’(w))” — B(w)B”(w) = 8'(w)az > O (oz = variance of x) so that 0 
yo <= 1. This inequality satisfied by y. can also be deduced as a consequence of 
the Schwarz inequality on inspection of the second formula for y, . It follows 
for any y > 1, p(w, a,) = Blw)f lyr — 6(w)}°e** du(x) is strictly increasing 
in y for all w. Consequently, if y’ is chosen satisfying 1 < y’ < y, then 
p(w, a,(x)) < p(w, a,(x)) for all w in Q and therefore a,(x) is not admissible. This 
argument can be extended as follows: Suppose [(8’(w))” — B(w)8”(w)]/(B’(w))” 
ranges between L and L’ (L < L’) as w traverses the interval (w, &). Then y. 
lies in the range (1/(1 + L’), 1/ (1 + L)) = TJ and for any y >1/(1 +4. L) 
the same reasoning shows that a,(z) is not admissible. Whenever 2 is not the 
full infinite interval for many circumstances 1/(1 + L) < 1 and 2 is therefore 
not admissible. The converse implication is not valid. That is, if y lies interior 
to J, then it is not necessarily true that a,(x) is admissible. A counter-example 
may be provided as follows: Suppose the measure y is such that it spreads its 
entire mass throughout the interval 1 S x < 2. Then, 6(w) = E.(zx) likewise 
traverses the interval [1, 2] as w varies over the set 2 = (— ~, ~). No estimate 
of the form yx (0 < y < 1) can be admissible since this entails estimating 6(w) 
as less than one with positive probability. Whenever the observed x < (1/y), 
which occurs with positive probability, an immediate improvement of the 
proposed estimate yx is obtained by estimating @(w) as 1 in that range. This 
emphasizes the fact that an estimate a,,(x), admissible with respect to all esti- 
mates a,(x), need not be universally admissible. 

We direct attention to the question of admissibility for a,(x) where y is in J. 
Suppose g(x) is an estimate which satisfies p(w, g) S p(w, a,) for all w. This in- 
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equality may be reduced to the form 


« « 
2o/ Tu 4 [ ] . / | zw 
(g(x) — yr} B8lw)e* du(x) < 2 yx — grr) ||yzrBlw) + 8B (w)} dur) 
x L_. Of 


Let dF(« 8°(w) dw for constant \ + —1, and let a, b ¢ 2, a < b. Also define 
T(w) = f_S (g(r) — yr}B(w)e** du(x). Then, 


Suppose 7 1/(\ + 1). Then, the last term in (6) vanishes, and by a proper 
application of Schwarz’s inequality, (6) becomes (for 4 1/(\ + 1) 


~) 


‘B(b) VT(b)B(b) + 7 I V/ Ba) T(a) Ba). 


Let c be an interior point of 2. Suppose fi 8-'(w)dw— +2 as b—@ and 
fi Bw) de - + as a—w. Then it follows that (see Cases 1 and 2 below) 
T(w) = 0, a.e. But this requires that g(r) = [1/(A + 1)]z, a.e.; that is, the esti- 
mate z/(A + 1) is an admissible estimate. 


Case 1. 


lim 3'(b) T(b) = A> O. 


how 


Fix a and let H(b 


= f2 8'(w)T(w) dw. By virtue of (7) we can find an appro- 
priate constant C > 0 such that for b sufficiently close to a, 


H(b) S CV'B(b) VH"(0). 


This yields by transposition and integration 


qi i —~ lo fy [ 3 *(b) db 
H(b;) Hb) = », oe 


where b;, bz are chosen sothatb; < b: , H(b:) > 0. As b. — @ the right-hand side 


tends to + and the left-hand side remains bounded—which is impossible. 
Thus, Case 1 cannot occur. 
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lim B\(b) A T(b) = 0. 


t 


> w 


Let G(a) = = 8'(w)T(w) dw. By (7) and the assumption of Case 2 it follows 
that G(a) < [2/|\ + 1|] VB (a) / —G(a). Suppose there exists an ao such that 


G(a) > 0. Then 
SF 3 1 ms 
( i) ces cee a 3 “(a) da, 
A+ 1 | G(ado) G(a) Ja; 


where a; < do. As a; — w the right-hand side tends to + » while the left-hand 
side remains bounded. This is impossible so G(a) = 0, which implies T(w) = 0, 
a.e. We summarize the conclusions in the statement of a theorem. 
THEOREM 1. Let p(x, w) = B(w)e*” describe the density of the exponential family 
with respect to a measure p. If 
~b 
(i) 8 *(w) dw +« asb—@ 


“e 


and 
-e 
(ii) 8 “(w) dw > + as a — w, 
Ja 
where c is an interior point of 2 = (w, @), then [1/(A + 1)]x is an admissible 
estimate of 0(w) = E(x). 

This theorem subsumes as special consequences all previous known results in 
this direction (see [4] and [5]). We record several specific applications of this 
theorem of special interest. 

I. If 2 = (—«, ~) and wu possesses positive measure in each of the intervals 
(0, ©) and (—, 0), then a,(x) = yx for each 0 < y S 1 is admissible. In 
fact, the assumptions imply that 


converges to zero as | w| — ». Consequently (i) and (ii) hold for each A 2 0. 
II. If Q = (—~, ~) and there exists positive probability of observing the 
value zero, then a,(x) = yz for each 0 < y S 1 is admissible. The proof follows 
readily from Theorem 1 since 8(w) is bounded above. 
Ill. If 2 = (—«, «) with no further conditions specified as to the nature of 
u, then at least a;(r) = x is an admissible estimate of @(w). This is so since the 
hypotheses of Theorem 1 are satisfied for A = 0. 


~ (—w) a se . . 

IV. If p(z,w) = = ~ x*” e** for x positive where a > 0 is fixed and w 
(@) 

ranges over 2 = (— ~,0), then B(w) = (—w)* and 6(w) = —a/w. The unique 
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value of \ satisfying (i) and (ii) is equal to 1/a. Consequently, az/(a + 1) is 
the only admissible estimate of —(a/w) of the form yz. In the case of n obser- 
vations 2, %2,°::, 2, With z; independently normally distributed, mean 0 
and variance o , this result reduces to the well-known fact that 


A(x) = [1/(n + 2)] Dor zi 


is an admissible estimate of o°. The interval J in this case also reduces to a 
unique point. 

V. If du(x) = }e”'*', then B(w) = 1 — w and the hypotheses of Theorem 1 
are satisfied with \ = 1. It follows that a,(r) = yz is admissible for y S 3 


t= 2: 


Also, in this case J = (0, 3) so that no estimate of the form yz may be admissi- 
ble for y > 3. 

Further examples of similar type involving definite biasing can be cited. In 
numerous examples calculated where 2 has at least one finite boundary we 


found that a,(z) = z is not admissible. We propose a stronger assertion which 
includes this observation. We state in conjecture that the hypotheses of The- 
orem 1 are also necessary conditions for the admissibility of the corresponding 
estimate. This would imply in particular that whenever 8(w) approaches in- 
finity exponentially as w tends to one of its boundaries no estimate of the form 
yx can be admissible. 


2. Extreme value densities. In this section we consider densities of the form 
q(w)r(z), VSes25 4 
0, otherwise, 


where r(x) is assumed to be a positive Lebesgue measurable function of z and 
q (w) = ffr(z) dx < & for win @ = (0, ~). We further assume that the 
monotone decreasing function g(w) approaches zero as w — ~, or equivalently 
SS r(x) dz = &, 

The problem examined concerns estimating functions of the form [1/q(w)]*, 
a > 0. In determining proper estimators attention is directed only to estimates 
also of the form y[{1/q(x)]* = a,(x) where y is a positive constant. It is reason- 
able and justifiable to consider only a single observation because of the fact 
that x ordinarily represents a sufficient statistic. 

Since r(x) = —q'(x)/q(z) almost everywhere, we find 


p(w, a,) = g(w) | 7 : | r(x) dx 
Jo 


, 24 1 
a ae ee © 
2a + 1 act+l g°*(w) 


Hence, the minimum of the quadratic expression is achieved uniformly with 
respect to w for the single choice y = (2a + 1)/(a + 1). For comparison pur- 
poses we note that within the family of estimates considered the unbiased esti- 
mate of 1/q*(w) is (a + 1)/q*(x). The unbiased estimate can therefore be uni- 
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formly improved upon in terms of expected risk by applying the bias factor 
(2a + 1)/(a + 1)° < 1. We proceed to demonstrate the admissibility of the 


estimator [(2a + 1)/(a + 1)1]/q*(x) as an estimate of 1/q“(w). 
The method of proof follows the same general ideas as used in the preceding 
section. Suppose q(x) is an estimate which satisfies the property that for all 


2Za+ |] 


p\w, a,), y = - 


ast 


Consequently, 


a(w) = | ¢ a -—2 ) q(w)r(x) dx 
. 1S); 
(10) 
—" | | _. = q(x) y — _1 alw)r(a) dx 
~ /0 L (a) wo q*(x) q*(w) 5 a 


it is enough to show that the only g 
| 


In order to check admissibility for a,(x 
satisfying this system of inequalities is g(x) = a,(x), a.e. In view of the formal 
ism indicated in the introduction the aim is to integrate the formula of (10) 
with respect to an appropriate monotone increasing function in order to caus 


the right-hand side to vanish. This essentially implies admissibility. Accord 
ingly, we select dF (w q'(w) | q?(w) dw where 8 2a — 1. Then, 
(8 +2 H (@+2- a yy = (2a + 1) (a + 1). 


By direct calculation we obtain 


| 
} 


/ 5 
f Ala) qd \w) g \w) dw = 


2 = L 
> € is “7 r/ 
2 ‘ gq (e) 
- ] Y_ — g(x) | r(x)q*** (e) | 1 -— 2 Ja 
3+ 2 a q* (x) J L q*(x)_J 
Since q(x) 2 g(r) for x S 7, we deduce with the aid of Schwarz’s inequality 
that 
a f = q (r ' 
q \t) | = Qt) | VW rizigls) Vv r(zjqz) | ik -— = ay 
12 q 


= Ww ‘a(r) q (7, = ‘al r)q (r) \ q? 


In a similar way the second integral of (11) has a bound equal to 


a 


V ale)q?(e) V gle ). 


By combining the relations of (11) and (12) and the last stated bound, we 
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obtain 
8 2 
alw) g (w)| d(w) dw S 
a+ ] 
(13 


[—S D [g(r sr qe) 
at Raha aie q(r)| A/ - + V ale)q’(e) \q’(e) 7-\1 
q (Tt) q €) 
The analysis proceeds by examining two possible cases 
(‘ase 1 
— Val(r) | q'(r) G(r) Vaq(r)/\q'(7)| = 4 > 0 
aap ae 


Fix «€ and set H(r fia w)) a’ lw P(w) dw. There 


exists a constant ( such 
that for sufficiently large 7 


14 H(r) < CV HG \ Ww 


We now show that this relation leads to an absurdity Indeed, squaring the 
expression of (14) and solving the differential inequality 


we deduce that 


15 (2 ] 1 tw q(8 
H(s H(a) q( a 


i 
where 8 > a and a sufficiently large. As 8 — © the left-hand side of (15) re- 
mains bounded while the right-hand side tends to 


~ x which is impossible 
Thus, Case 1 cannot occur 


) 


CasE 2 


Let + tend to + ~ along a sequence {7,} for which 


lim V alta) aq’ r.) Plt. XV q( Tn) q' (tn) = ff) 
Then, by (13 
x : - 9 eae = 
(16) Ge) = a(w) | q (w) |g (w) dw S a 7 Vv ale) | q’(e) | P(E) V g(e)/\q’ (©). 


de aw 


Suppose G(e) > 0. Then G(e) 2 Gle) > O for e S &. (16) can be written as 
G(e) Ss [2/(a + 1 


IV —G'(e)V q(e)/ q'(e€) | . Transposition of terms in this ex- 


pression and integration over (« , €) yields 


“ 2 7 
- é | l l q\e 
(17 ~ aa SP : 
; (. a J | Gla Ge) | 08 q(€1) 


J 1\ 


A 


As « — 0 the left-hand side remains bounded but the right-hand side tends to 
— x which is an absurdity. Thus the supposition that G(e) > 0 for some ¢ > 0 
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is erroneous, and therefore G(e) 0. Consequently, a( 0, a.e., Which im- 
plies g(x) = y/q"(x), ae. 

We have thus established the truth of 

THEOREM 2. There exists a single admissible estimate of 1/q*(w) of the form 
v/q°(x), and this is given by y = (2a + 1)/(a + 1) 

The following specific application might be of some interest. Let r(x) = 
nz". Then [(2 + n)/(1 + n)]z is an admissible estimate of w. Furthermore, 
this is the only admissible estimate which is a multiple of x. 

This states that if 7), z,,---, 2, represents n independent observations 
from a rectangular density spread on the interval (0, w), then 


[(n + 2)/(n + 1)] max, x, 


is an admissible estimate of w with respect to squared error. 

To pinpoint the reasons for the validity of the preceding methodology it 
seems worth emphasizing that although for uny y it is possible to construct a 
measure ¢°(w)! g’/(w)| which formally gives 

+20 


_ gd’ ***(w)q'(w) dw 


Y — 0@ 


q(x) a + / \ 
q’*'(w)q (w) dw 


ad 


’ 


nevertheless, the reader will find that it is only possible to justify the formalism 
for the special choices 6 = 2a — land y = (2a + 1)/(a@ + 1) as we have done. 
The estimate [(2a + 1)/(a@ + 1)]/g*(x) of 1/q*(w), although uniquely admis- 
sible with respect to square error, is still not altogether acceptable. It is dis- 
turbing to note that the estimate [(2a + 1)/(a@ + 1)]/q*(x) is very closely 
tied to the measure of error described by quadratic loss. If the risk function is 


given by 
} nw ¥ l 2k 
q(w — <— —~ (x) dx 
ied I, at wa — 


Pie r -_ 
= Tra t+ ir 7) ge 


) 
} 
fo 


il 





p(w, a, (x)) 


I 


it can be shown that the minimum is achieved uniformly in w at a value y, 
which strictly varies with k. This implies that an admissible estimate with re- 
spect to square error need not be admissible when considered for the error 
function involving 4th powers. It is found that when a = | in the case of square 
error the best estimate of the type y/q(x) is 3 1/q(x) while for the loss function 
of fourth powers the best estimate is y*/q(x) where y* > 0 satisfies 


ty’ — 37° + 4y — 2 = 0, 


which is slightly larger than 3. 
We close this section with a brief discussion of the problem of admissibility 
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for the density 


q(w)r(z), x = @, 
p(x, oO) = 
0, wo S 
where r(x) is a positive measurable function ol and 7] U Ju TL) ar < t. 
for w in 2 Wp, < 
One important such example is furnished by taking r(r e*, q(w é 
; t 
and 2 -x, «), Another example is obtained by setting r(z) ve ,e > 3 
and w 0. As before our problem is to estimate the quantity 1/q*(w) by using 
estimates of the form y/q*(r). We assume in what follows that q(w 0 or 
1 2 } 
equivalently 3, r(x) dr L 
THrorem 3. /f 
( w)r(a ~ 
Pp I Ww = 
0, wo S w, 
where q (uw (2 r(x) dx and qlw Q, then |(2a + 1)/(a + 1)]/q*(x) is an 


admissible estimate of 1/q"(w) urth respect to quadratic loss 
The proof of Theorem 3 parallels that of Theorem 2 subject to simple obvi 
ous modifications and will therefore be omitted. 


3. Translation parameter problem: single observation. The random variable 
X is distributed according to the probability density p(z7, w) = p(x + w) where 
w ¢Q is the unknown state of nature and p(£) is a known, fixed density function 
which satisfies [*.. &p(£) dé 0. The analogous problem where X is an integer 
valued random variable and the parameter likewise ranges over the set of dis 
crete integers will be discussed later. The problem is to estimate the param- 


eter —w. If x is the single observed value, then the usual (unbiased, invariant 


estimate of —w is 6(x) x. The property of unbiasedness is easily verified and 
for its relationship to invariance the reader is referred to [1]. The principal 
goal of this section is to establish the admissibility of this estimate, 6(z) = z, 


subject to appropriate smoothness conditions. 

This formulation of the translation parameter problem differs notationally 
from the customary version. If w is substituted for —w, then the familiar form 
of the problem will emerge. The difference in the formulation of the problem is 
not significant in any way and on the other hand is helpful in that it leads to 
a more convenient form for applying theorems on Fourier transforms. 

To establish admissibility it is sufficient to show that the inequality 


P\e®, J) = pla, o ’ 
or equivalently 
x 2 
(18) [ [x -- g(x) p(x +w)dzxr< 2 / [x — g(x)|[x a w| p(x + w) dr, 


implies q(x x, 2.e. 
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To accomplish this it is necessary to impose the following assumption 
ASSUMPTION I. 
~x % x 


| & p(é) di < a, | ep’ (é) di < x, and | Ep(é) dé = VU. 
i x 


JF Jn J 


The meaning and relevance of the last condition was discussed above. The 
first integrability requirement is indispensible in order that (18) define a mean- 
ingful relationship. The second finiteness condition represents a slight further 
restriction beyond that of the first integral. For instance, the second integra- 
bility condition would be an immediate consequence of the first integrability 
condition and boundedness of the density p(£). 

We further assume initially that we deal only with alternative estimates 
g(x) satisfying | g(x) — x| S$ M < oa. The nature of this restriction is con- 
siderably milder than might appear at first glance. It will later be shown that 
this constraint may be completely eliminated or, equivalently, we will show 
the only estimates for which (18) is possible must satisfy this restraint. 

Unless stated to the contrary we suppose hereafter that Assumption I and 
the boundedness requirement on competing estimates are satisfied. 


Lema 1. /f p(w, gs p(w, 5) for all w, then ie (g(x) — xr)*drt < x 
Proor. Define &(u) = f*.. Ep() dé. Then 
i] [x —_ g(x) pla TW is | dw 
(19) SZ | x — g(x) | dz | (x + w) p(x + w) des| 
=. —% 
< 2M Pr2r+n) —O(x —n) dx S 4M | (—@(u)) du 


us —®(u) is positive. But, 


u [. tp(é) dé! < [. fp(t) dé aasu —« 
—* — 
and 
u [. tp(t) dt} = |u [ tp(e) de| < P Ok we be 
SO 
uP(u)| — 0 as u|—o. 


Hence, integration by parts yields 


an 


| (—#(u)) du = & p(é) di < x. 


x 


Allowing 7 to tend to infinity in (19) after interchanging the order of integra- 
tion produces the desired result. 
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0(z), a.é. 


p(w, 6) for all w, then g(x) 
’s theorem its 


&E — gl) é L* so DY Plancher: 
and belongs to ig 


THrorem 4. If p(w, g) S 


Proor. By Lemma 1 6,(&) 


Fourier transform ¢,(u) is defined 


According to Assumption I 42(£ p(t) ¢ L’ so also its Fourier transform 
¢2(u) exists and 


] ao 


: A int 
glu) = e”*6.(&) dé 
ay 27 I. 


(since the integral exists). The function @(w) = | wal 6;(x)@.(x + w) dx, which is 
essentially a convolution of @, and 6, also belongs to L*. It can be readily veri- 
fied that its Fourier transform is ¢(u) = ¢;(—u)g2(u)(eL’). 

Since ¢; and ¢2 both belong to L’, by Schwarz’s inequality 


é L’ 


g¢= gil Uu oo (u 


By the inversion theorem on Fourier transforms 


a 


a) 


20 | (x — g(x) ) (x + w)p(z + w) dr = e 'g1(—ujgr(u) du 
a x 


¥1 


for real w as both sides represent continuous functions, the first by virtue of the 
fact that tp(¢) is in L* and the second since ¢1-¢ is in L’ as established. If both 
sides of (20) are integrated from —n to n and the order of integration is re- 
versed on the right-hand side (which is permissible for reasons indicated below) 


~™% x 
| ‘ (x — g(x))(z +a) p(x + w) dz) dw 

~ n 2 x 
fs \— U4) G2 \U) ¢ —ine 

= — é 

1u , 


o 


To justify the interchange on the right-hand side we observe first that ¢(0) = 0 
™S¢"n(£) dé is bounded independently of u. By the 
where 0 S &% S u. Thus, 


while ¢olu [t/~ 2rlf=.e 
mean value theorem ¢o( 


u= colli 


limy+o¢e(u)/u << 


and fi. ¢e(u)/udu < x for e > 0. This implies [g;(—u)g2(u)]/u e L’. But the 
e 1 uw 
asserts that for any q ¢L’, f*.e™*q(u) du — 0 as 


Riemann-Lebesgue theorem 
w +x. Therefore, 


f x 


(x — g(x))(x + w)plr + w) dr>dw = 0 


lim 


howe Jn J— 0 


and on account of (18) we may infer that 


0 


vx 


which implies g(x) 
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As mentioned previously, there are two general cases in which the bounded- 
ness assumption is satisfied by all g which need be considered. 


( . 
f og on E<at>b—ao<a<b-: 
CasE |: P(E) 4 ~ : 
= otherwise. 
This type of density is fairly general and will occur, for instance, when any 
distribution is truncated at finite endpoints. 


Suppose z is the observed value. Then, because of the form of p(é) 


S2z- a. 


Any estimate which assumes values outside the interval [« — 6, 2 — a] can be 
improved upon by an estimate h(x) which satisfies x — 6 S A(x) S x — a for 
all x. Intuitively this is clear; a rigorous proof may be readily supplied by the 
reader. Thus if p(w, g) S p(w, 6) for all w and g does not satisfy the bounded- 
ness assumption, there exists another estimate A(x) such that 


h s , oe > | a , 1 b 


and p(w, h) p(w, g) & p(w, 6) for all w. Since this implies h(x) zr, a.e., and 
hence p(w, h) = p(w, 5), p(w, g) = p(w, 6) which implies that 6 is admissible. 
In addition, note that Assumption I is automatically satisfied in this case. 
CasE 2: p(z, w) = p(x + w) has a monotone likelihood ratio. 
LemMA 2:/]f p(w, g) S p(w, 5) forall w, then there exists a constant C such that 


a 


| (g(x) — x) Pi«a+widxr eC 


for all w (under Assumption 1). 
Proor. By Schwarz’s inequality 


SP 2.7 
= (g(x) — x) p(z + w) dr < 


* i 0 
= il (g(x) — x)° p(x + w) ae | | / 


It follows easily that 


J 


Without loss of generality it can be assumed that g is a monotone estimate 
(1.e., @1 < 22 implies g(z,) S g(x2)). Since the monotone estimates constitute a 
complete class (ef. [3]), any estimate which improves upon 6 and is not mono- 
tone is in turn improved upon by a monotone estimate. 

We add for the purposes of convenience the following assumption, which is 
so exceptionally weak as not to constitute any real restriction. 

AssuMPTION II. There exist constants a, , a2 , b such that a, < a2, @, — a < 
1,b > 0, and p(t) = bfora, SF § FS a 
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Suppose there exists a sequence {2,} for which g(z;) — zr; ~ » asi— om, 
Then, there must exist for any n an index 7, such that g(z,,) — z;, 2 n. Since 
g is monotone, g(é) — § 2 n — 1 forza; S — S xz, + 1. Let 

@ (a, + As 9 oo (x, + +) 
Then, 
x 
, 9 9 
22 bla, — a)(n — 1)" S [ (g(x) — x)" p(x + &) dz. 
. x 


But by Lemma 2 the integral is bounded by C < ~. Since n is arbitrary, this 
leads to a contradiction. 
A similar argument applies if there exists a subsequence {z,} such that 


g(z;) —-2“4--2 
asi — «©. Thus, g(x) — x must remain bounded and the admissibility of 
é(z) = z 


then follows according to Theorem 4 in the case where p(z + w) has a mono- 
tone likelihood ratio. 

The preceding argument also shows that in order for | g(x) — x! to be un- 
bounded and consistent with the result of Lemma 2 it must peak up 
very sparsely for durations of increasingly shorter lengths. Such pathologies 
are not excluded readily by means of our methods except for the two cases dis- 
cussed. It seems unreasonable to admit such estimates for consideration. 

A third case for which Theorem 4 is valid without the assumption of bounded- 
ness being necessary corresponds to the situation where p(t) tends to zero ex- 
ceptionally fast. More precisely, we assume that 


[ spe) dt} — 
vx = * 
p(u) 
For example this is satisfied by the standard normal distribution. The bounded- 
ness assumption was only used to prove Lemma 1 which on closer inspection 
is also valid whenever we can show that the expressions 
) 


| g(x) — x|[— (x + n)] dr = A(n) 


and 


B(n) 


ll 


[ g(x) —x|[— (x — n)] dz 


are uniformly bounded. We study only the case of A(n), the argument being 
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similar for B(n). Invoking Schwarz’s inequality, we obtain 





i(n) < NV | (g(x) — 2} (— &(x + n)) " g/ [(— 8) ae 


/ 


} 
S¢ 4 } ox) — zx} pia+n)dzx> < e. 


where the last inequality is valid because of Lemma 2. 

What we have shown for the general problem is that 6(z) = z is admissible 
within the class of all estimates g satisfying | g(z) — 2x| S M for all z where 
M is any finite constant. This means that x is admissible with respect to all 
estimates which do not differ too wildly from it. An appropriate formulation of 
the conclusions may be made in terms of a concept of local admissibility. 

We close this section with a brief discussion of the case where the observation 
is integer-valued and the parameter w also traverses the set of all integers. The 
analysis is considerably simpler. 

In this case we can deduce immediately from the analog of Lemma 2 and 
equation (22) that if p(w, g) < p(w, 6) for all integral w then | g(x) — 2 | is uni- 
formly bounded. 


Assumption I may be slightly weakened and now takes the form: 


(24) > x p(t) < « and z. zp(rz) = 0. 
z 
The role of Fourier transforms in the analysis is now replaced by Fourier series 
and the general line of the arguments carries over to the discrete case mutatis 
mutandis. Summing up we get: 
THEOREM 5. Suppose x and w are discrete and integer-valued, and the conditions 
(24) are satisfied. If p(w, g) S p(w, 5) where 6(x) = x, then g(x) = 2, a.e. 


4. Translation parameter problem for the loss function L(a, w) = (a + w)”* 
with one observation. In the preceding section 6(x) = x was seen to be an ad- 
missible estimate of —w in the translation parameter problem when the loss 
function is L(a, w) = (a + w)*. Under suitable assumptions which are analogous 
to Assumption I and the boundedness restriction it will now be shown that 
6(x) = x is also admissible for the loss function L(a, w) = (a + w)**. Note that 
consideration is still restricted to the case of a single observation. 

The assumptions imposed are the following: 

AssumPTION IIT. p(é) is symmetric, i.e., p(é) = p(—&). 

3ecause of this assumption the odd moments of p(&) vanish. This property 
is used in a crucial way. 

If h(x) is an estimate which presumably improves on 6(x), then we shall as- 
sume 

Assumption IV. There exists a constant M > 1 such that |A(z) — xj S M 
tor all x. 


Several remarks will be appended pertaining to this assumption after the 
completion of the theorem. For suitable general classes of densities p we will 
find as before that Assumption IV is unnecessary. 
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- x.) oo Pe 6 oF 3 rs - 
ASSUMPTION V. Ce & Pls) dé < @, a £ P \&) dé “ x 


It is readily verified that the analogue of equation (18) is the following: 


2 2N-—2k—1 
(c + w) p(x +w) dz. 


The proofs of this section are completely analogous to those of the preceding 


section. Consequently, the detailed proofs will be shortened appropriately. 
LemMA 3. If p(w, h) S p(w, 6) for all w, then f*.|\2 — h(x)|\“dx < = for 
a = 2, (under Assumptions III, IV, and V). 


Proor. By (25), Fubini’s theorem, and Assumption IV 


where #,(u) | 
$,(u) e L. Thus 


K i mM** 


where C is a constant indepen 


kal \Sh wx 


THEOREM 6. Let Assumptions III, IV, and V be satisfied. p(w, h) <= p(w, 4) 
for all w implies that h(x) = 6(x), ae. 

Proor. The proof is obtained by adapting appropriately the methods em- 
ployed in the discussion of Theorem 4. The details are omitted. 

A few remarks promised earlier concerning Assumption IV will now be given. 
As in Section 3 for the case 


E<at>b,—«~ <a<b< a, 
0, otherwise, 


p(é) = 


the only type of estimate which need be considered is an estimate h(x) satisfy- 
ing Assumption IV. The proof is the same as before. The argument for the 
second case in which p(x, w) = p(x + w) has a monotone likelihood ratio is 
almost the same. It depends on Lemma 4 which may be derived with the aid 
of the Holder inequality. 
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Lemma 4. If p(w, h) S p(w, 5) for all w, then there exists a constant C such that 


(x — h(x))*” p(x + w) dx < C 


for all w (under Assumption V). 


5. Translation parameter problem: n observations. The problem studied in 
this section is the multi-observation analogue of the problem treated in Section 
3. Let x1, °-:, Xt, be nm independent observations on the random variable X 
where X is distributed according to the density function p(z, w) = p(x + »), 
we(—«, ©), with p(t) a known prescribed density. Alternatively, X is allowed 
to be an integer-valued random variable with w likewise assuming only integer 
values. P{|X = 7|w} = p(t + w) where the probabilities p(j), 7 = 0, +1, +2, 

- , are assumed known. As previously the location parameter —w is to be 
estimated. 

Define y; = 2%; — 41,7 = 2,---,m. An appealing estimate for the param- 
eter —w which was proposed by Pitman and has the property of being invariant 
with respect to translations of the observations z, is 


(26) 5*(21, 22, °°* , 2n) = % — Tye, +++ , Yn); 


where 


x 
2 


[IL pt 9 a 


. vw.) = 
» Yn) = = ‘ 


/ p(é) II plyi + &) dé 


Invariance of 6* means that 
5*(x11 + 6, 22+ ¢6,°°*, an te) = ct 6*(11, 42, °° » tn) 


for each constant c, an obviously desirable property when dealing with an un- 
known location parameter. It is well-known that 6* is an invariant minimax 
estimator of —w (cf. [5)]). 

Girshick and Savage [5] in discussing estimating procedures associated with 
quadratic loss conjectured that the estimator (26) is unique minimax. Since 
the risk of the estimate 6* is identically constant it follows that in order to 
substantiate this conjecture it is enough to show that 6* is admissible. This has 
been verified by Blackwell for the special case where both X and w are essen- 
tially integer-valued and where p(z) vanished except for at most a finite number 
of 7 [7]. He also constructed an example in which X traversed a discrete set and 
the range of w was also discrete with values incommensurate with the possible 
X values, and he showed that 6* need not be admissible in this case. This is 
not at all surprising in view of the fact that the usual demands corresponding 
to invariance in essence necessitate that the possible values of X and the w 
values should comprise the same group structure. This characteristic was vio- 
lated in the example of Blackwell. 
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We shall establish the admissibility of 6* as an estimate of —w in three sepa- 
rate cases which include most of the common distributions. In two of the cases 
we deal with densities of a continuous real variable for which w traverses the 
real line. The third case examined is the general discrete problem where X and 
w range over the integers. Blackwell’s result for discrete densities with bounded 
domain emerges as a special case. 

The convolution character of the location parameter problem suggests a 
representation of the problem in terms of Fourier integrals. It is therefore nat- 
ural for our arguments to appeal to the powerful developed techniques 
of Fourier analysis which we, in fact, use abundantly. Our methods conse- 
quently apply to a considerably wider class of distributions which includes 
most of the common situations. The sequence of lemmas established follows 
principally the line of reasoning of the analogous single observation case and 
may be considered an extension thereof. 

The three cases require separate analysis because of the different regularity 
assumptions needed for each. To establish the admissibility of 5*(z, ye, +--+ , Yn 
we must show that if the inequality 


x p(x + w) p(x + wt yo) --- plz + ut y,) drdy2--- dy, 
(UR e - 
Ss | a [te Y2,°°* Yn) + wl 
X p(x + w) plz + w+ ye) «++ plex +wt y,) drdyz +--+ dy, 
= p(w, 6*) =c 


is valid for each w then g = 6*, a.e. For the discrete case (i.e., X and w are in 
teger-valued) the integral is to be replaced by the appropriate summation. The 
inequality (29) below is equivalent to (28). 


[+ [Noyes ++ sun) — 8G ues ul 


(29) X p(x + w) II pix + wt yi) drdyz--- dy, 


< 2 | “a | [5* — glld* + wlp(x + w) [] plz + w + y,) dady: --- dyn. 
CASE 1. 


0, —-e<astsbdb<ec, 


II 


p(t) 4 
\= 0, E<a,b<i. 


Only estimators g(71, y2,°-: , Yn) of —w which satisfy 


| 8*(z1, Y2,°°* 5 Yn) — O11, Y2,°°*, Yn)| SM 
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for some constant M < © need be considered. The argument is analogous to 
that given for the one-dimensional case in Section 3. The underlying reason is 
that the boundedness of the spectrum determines for each set of observations 
X1, %2,°°*, 4, an interval within which the true value of —w must lie and 
any estimator which produces a value outside this interval can be improved 
upon. Thus any estimate which differs from 6* and which improves in terms of 
risk on 6* must only differ by a fixed constant from 6*, regardless of the observed 
values of x. The single regularity assumption required in this case is that for 
all é,0 Ss p(t) $C < om. 
Lemma 5. If p(w, g) S p(w,6*) for all w, then 


J 


9 


2 
/ [s*(2y, adm Yn) ~ g(x, » Ha, * 2" Yn} 
x 


*« | p(é)p(ye + §) +++ pun + ode | dxidy2--- dy, < %. 


« 


Proor. By the fundamental inequality (29) 


n aw - 
[oe [teens +) = oleae, 
—h J—H — 


TT Ww) dx, ee 
jn) | 
X1 + w) dxridy2 --- dy, dw 
_ 
dxydy2--- dyn, 
where 
P(u, yo, +++ 5 Yn) . (yo, ---, Und |p) p(y + 8) 
By direct calculation we observe that 
(32) B( 0, Yo, ++, Yn) = O(— %, Yo, +++, Yn) 0. 


If we can show that f*. --- f7.|®(z1, yo, +--+, yn)| dridy2--- dyn < ~, 
then it follows that the expression of (31) is uniformly bounded with respect to 
n which clearly implies the sought-for conclusion. The remainder of the proof 
consists in verifying the finiteness of this integral. 
Note that ®(x,, yo, ---, yn) S O for all 1, yo, ---, yn, and for fixed yo , 
- , Yn there exists a constant N such that | x;| 2 N implies 


@(21, Yo, °°° 
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Integration by parts with respect to 2 yields 


) dxydy2 +--+ dyn 


jn 


ow 
» Yn) |P\X1) ° °° Jn Zu dx; -++ dy, 


finite 


But the last integral converges absolutely since p(£) vanishes outside a 


T(y2,°-:,Yn)| Slal+ |b. 


interval and 
THEOREM 7. If p(w, g) S p(w, 5*) for all w, then g = &*, a.e. 


Proor. Let 


G(x, >» 2 


I [= p@ <p ua + 


a [p(u) p(y. + wu) 


2 ° ° 
é L’, and by direct calculation we see that 


By Lemma 5, G(2, y2,°+:, Yn 
, tn) and 


H(u, ye,--:, yn) € L’. Therefore, the Fourier transforms G(t,, --- 
H(t,, ---,t,) of G and H, respectively, are well-defined and belong to L*. Con- 


sider the expression 


aA 7 ) + wn) | 


X plar + wi) +--+ Dn + On + ti + wr) dridye --- dy, 
q y 


need to evaluate this expression only for the 
) is essentially a convolution of G and H so its 


where for our purposes we shall 
= Q. (33 


values we = +--+ = Wa 
Fourier transform exists and is equal to 
lo, ++, —tn)Hlt, te, --- tab eL. 


<4, «4, 


By the inversion property of Fourier transforms we obtain 


Jn) — Git, Y2,*** > Yad} 


ai) +++ DUYn + 2X1 + ws) dx --- dy, day 


oss = £)MG, «++ 0). ctne ityn} 
a St fe — | dy «++ dt, 
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which is defined everywhere since G-H belongs to L’. The last integral is an 
absolutely convergent integral. To substantiate this assertion we note that 
H(0, t2, --- ,t,) = 0, and 


Bogs 
slat eae t, 
= Alt ) 


a) 


= i | see | exp [i(tia1 + --- +t,2,)|xi[z1 — T(y2, «++, yn)] 


* ~1/2 
xX | | (€) --- ply, + &) as| p(xi)plye + x1) -+- pn + 21) dr, - ++ dy, 


is bounded independently of 4, --- , ¢,. Hence, by the mean value theorem 
G(—t,---, —tn)H(t,---, tn)/t as a function of f, is integrable in a neigh- 
borhood about the origin for all #., --- , t,. Therefore, 

G(—h, +++, —hA(h, -++ , tn)/eL’. 


By virtue of the Riemann-Lebesgue lemma and Lebesgue’s theorem of domi- 
nated convergence we see that the expression in (34) tends to zero asn — &. 
Hence, on comparison of (29) and (34), we deduce 


| ote | [x1 — T(y2, +++ Yn) — glti, ye, °°* Yd] 


x | p(ai + w) +++ Dn + 11 + &) de | dz, --- dy, = 0, 


which establishes the theorem. 
Case 2. Discrete case. 


X = {0, +1, +2,---} Q= {0, +1, +2,--- 
and 


| 


P{x =1|} = pli + wo), 
where p(j) = 0, )-f—« p(j) = 1. 

We impose the following regularity assumption. 

Assumption VI. 07-2 j?V/pj) < =. 

Unfortunately, we do not know whether this assumption may be relaxed to 
the obviously weaker and more natural condition >.-2 jp(j) < «. The weaker 
requirement was indeed sufficient for the case of a single observation whenever 
>. 7 jp(j) = 0. [See Section 3.] 

Lemma 6. If p(w, g) S p(w, 5*) for all w, then there exists a constant C such that 


Zz el 7. [5*(a1, ye, err » Yn) init g(x1, ye, maT Yn) } 
Un 71 


X pla t+) --+ pn tute) SC 
for all w (under Assumption V1). 
The proof is analogous to that of Lemma 2 of Section 3 so it is omitted. 





ADMISSIBILITY FOR ESTIMATION 431 


Lema 7. Jf p(w, 9g) S p(w, 6*) for all w, then 
Doss De [6* (ai, ye, ++ Yn) — Gla, Ye, *** Ud 


x [>> pla, + w) cee Dyn + 11 + w)] < &. 


Proor. As a consequence of Lemma 6, 





q@ 2 
1— T (y2, 2! Yn) — g(X1, Y2; eee » Yu) | s —— —_ — = 
V p(ti + w) +--+ (Yn + 21 + w) 
for all w. Since w is arbitrary, 


max | 21 — T(y2,+-+, yn) — glti, y2, *** y Yn) 


z A 


1/9 


(3? 


<= 





~ max Vp(j)p(yi + J) *** P(e + J) 
Define for integers u, 


@(u, y2,°°* yn) = De [7 — Tlys, +++ , yn) |p(j)p(y2 + 3) «++ p(yn + 3). 


)——x« 


By the fundamental inequality (29) 
n 
, / \72 
DV des Lele — Tee, «++ yn) — gla, ya, -** » und] 
X plzi + w) +++ pn + 11 + w) 


qua 
SO 2 vo 2 meen _ aid 
y 


. = MAX y/p(j)p(y2 + J) -*- P(Ye + J) 





XK | B(zi + n, yr, +++, Yn) — Oli — n, yr, +++, Yn) |. 
It is easily checked that @(u, y2,---, yn) S O for all u, y2,---, ya, and as 
|u|— «© | ub(u, ye,---, yn)| + 0 by Assumption VI with the aid of the fact 
that (2, y2,---, Ya) = O. Summation by parts with respect to 2 yields 
on. «6s = (21, Yo, °°» Yn) ae 
Yn 71 Max V p(j)p(y2 + J) +++ P(Yn + J) 
? 





=>... Hae = Ts, --- yd pdpys + 2) --- Py» + 2) 
‘ z max V p(j)p(y2 + J) ++: P(Yn + J) 
(36) ' 





IA 


ae 


2>>-:-- zip(x)p(y2 + 21) «++ plyn + 2) 
gs “1 max V p(j)p(y2 + J) *** Pye + J) 


. 
, ' (ay) +--+ p(yn + 2) 
<2’... pst ese, 
Un Zi V p(x) -++ D(Yn + 21) 
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where 7. ee 7 denotes summation over all 21, +--+, yn for which 


p(a1)p(ye + M1) +++ P(Yn + M1) > O. 
But by Assumption VI 


Dees Deti V pla) -** PY + 21) 


Jn 


Hence, (35), (36), and (37) in conjunction yield the desired result. 
THEOREM 8. [f p(w, g) S p(w, 6*) for all w, then 


8*(x, ye 


for all y2, +++ , Yn such that a P(j)p(y2 + 3) +++ Plyn + J) > Oz 
Proor. Let 


> 


2. P\J)ply2 + J) ++* Py» 

[p(u)p(ye 

: a G'(2, Y2,° »Yn) < & DY Lemma 7, 
“/ Vr 

>: > exp [i(tjay + --- n Yn) |G(a4 


= 
ad | 


Mentos: -@ 


converges in quadratic mean to a function G(t;, --- , tn) e L’(—-2, x). Also, by 
Assumption VI ; ll see Dn |H(x1, ye, °***, Yn)| < ©. Indeed, inspection of 
the series shows that its convergence would be a consequence of the conver- 
gence of the related series 


y ju POOPY u ee Yo) ° "Pi u + Yn) 


am V pe ye + ye) -=* PET yo) 
This follows in view of the inequality of Schwarz, the uniform boundedness of 


p' (u) )p\ Cu - Y2) see plu - Yn) 
2 Pe P(E + ys) += PE + yn) 


and Assumption VI. Hence, 


* 5 ts 


> es Zz exp [i(t, ao t yn) |H (a1, Ye 


Un 


converges uniformly and absolutely to a function H(t,,--- . t,) e L’(—7, 7) 
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The expression 


I(wi,°*: ,w )j= =. cee >, [a1 — Tye, --*s Ya) — glti,Y2,°**, Yn) 


X [ar + wr — Tye + we, +++, Yn + wn)] 


X play + w+: + plyn + on + 21 +e), 
where in actuality w. = --- = w, = 0, is essentially a convolution of G and 
H so its Fourier series converges absolutely to a function I(t,,--- , t,), and 
I(t, ++: , be G(—h,--:, ~t)H(h,---, t.), a.e. Since 


° ® 


a er (2n)* f si exp (—itwn)I (ty, «++ 5 te) dh «++ dtp 


and >.",e °° = fe" — "491771 — 6"), it follows that 


> I(w,0,---,0) = on -[ vee [ I(t; , +--+, ta) 


ein _ eo iter 
x| ar Jaw. dn. 
- < 


The interchange of summation and integration signs on the right-hand side of 
(38) is valid since by virtue of Assumption VI 


limy, +0 I(t, oe t,)/(1 aa et) <a 


and I(t,, --- , t,) ¢e L'(—x, r). But by the Riemann-Lebesgue lemma the right- 
hand side of (38) converges to zero as n — ©. By the fundamental inequality 
do Dela — Tyr, +++, yn) — gla, yess yd (2S pi) --- pPYat+J)] £0 
Vn 71 
from which the desired result follows. 

Case 3. General density functions. 

This case will include all density functions which satisfy the following regu- 
larity conditions: 

Assumption VII: 


0 < p(é) 


IIA 
CQ 
—~ 

8 


AssuMPTION VIII: 


PLE) Pe + §) +++ DYn + §) 


een , 


p(@)p Ye — 6) ee P(Yn 6) dé 7 





. 
A 


for all E, y2, a 


Assumption VIII asserts that the conditional density of x given yo, --- , Yn 
must remain bounded for all 2; , y2, --- , yx . This assumption is a bit stronger 
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than necessary; it could be replaced by an assumption of finiteness of a number 
of definite integrals involving the conditional density. However, there seems to 
be no gain involved in such a generalization. The class of densities which satisfy 
Assumptions VII and VIII includes as two of its important members the nor- 
mal and negative exponential distributions as well as any density which asymp- 
totically dies off like a power. 

It will be shown by Theorem 9 below that 6* is admissible with respect to 
the class of all estimators g(a, y2,--- , yn) Which satisfy the following addi- 
tional requirement: 

AssumpTION IX. There exists a constant M < = such that for all x, , ye, --- 
Yn | 8*(t1, Yo, °-+ , Yn) — G(X, Y2,°**, Yn)| S M. 

This will establish a suitably broad form of the concept of “local” admissi- 
bility for the estimator 6*. This concept of “local” admissibility was introduced 
earlier in Section 3. As yet suitable supplementary conditions on the form of 
p(t) for the relaxation of this assumption have not been obtained. 

Lemma 8. If p(w, g) S p(w, 6*) for all w, and g satisfies Assumption IX, then 


2 x 


| eee / [5* (x1, Yo, °° *y Yn) — g(a, Y2,°*, yl 
00 J—2%0 


x If p(t) ply2 + &) --- plyn + &) as] dxydy2 --+ dyn < ~. 


Proor. The proof is analogous to that of Lemma 5. It is sufficient to prove 
that 


2 00 20 


| +f | @ (21, Yo, °°*, Yn) | dxydy2 +++ dyn < &, 


where ® is defined as in Lemma 5. ®(2;, y2, --- , yn) S 0, and as |u|— o, 


| ub(u, my 89 Yn)| aie 0. As “~~ —@ 


OS uP(u, yo, «+ 5 Yn) 
< | & n(t)p(ys + &) «++ p(yn + 8) dé 
(39) " 
+ | T(y2,-++, yn) | [ tp (€)p(y2 + &) --+ p(yn + &) dé. 


Both integrals in (39) vanish as u—> — » by Assumption VII. A similar analy- 
sis is valid as wu — © since 


[ [— — T(y2, «++, yn))p(é) «++ plyn + €) dé 


=-— / [— — T(y2, +++, yn] plé) --+* pl(yn + €) dé. 
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Integration by parts with respect to z, and an application of Schwarz’s in- 
equality yields 


os-| | (x1, 2, +++, Yn) dx, dy2 --+ dyn 


a 


-/ vf 21 (21 — T(y2, +++ , Yn)) plas) «++ plyn + 2) dx, «+ dy, 


IIA 


2 | me xi p(x) +++ p(yn + 21) dry +++ dyn. 


The final expression is therefore finite by virtue of Assumption VII. 

THEOREM 9. If p(w, g) S p(w, 6*) for c!l w, and g satisfies Assumption IX, 
then g = 6*, a.e. 

Proor. Define G(x, y2,--- , yn) and H(z;, y2, +++, Yn) aS in Theorem 7. 
By Lemma 8, G(x, y2,---, yn) € L’, and by Assumptions VII and VIII, 
H(z, y2, +: , yn) € L’. Therefore, the Fourier transforms G(t,, --- , t,) and 
A(t, , --- , tn) of G and H, respectively, are well-defined and belong to L*. The 
Fourier transform I(t, , --+ , t,) of 


x ) 


T(w: , +++, wn) =| re [z1 — T(y2, +++, Yn) — g(t, y2, °**, Yad) 


X [a1 + or — Tly2 + w2, +++, Yn + wn)) p(ti + wr) +++ P(Yn + wn + 21 + wo) 
x dx,dy2 eee dy, 


is well-defined and equals G(—t,,---, —tn)A(th,---, te). By an argument 
analogous to that of Theorem 7 it follows that 


lima. I I(a,,0,--- , 0) dw, = 0, 


« 


which proves the result. 
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ON THE ESTIMATION OF PARAMETERS RESTRICTED 
BY INEQUALITIES! 


By H. D. Brunk 
University of Missouri 


1. Summary. There are collected in this paper several observations and 
results more or less loosely related by their connections with the subject men- 
tioned in the title. The discussion moves from the general to the specific, be- 
ginning with some remarks on minimization of convex functions subject to side 
conditions, and ending with a discussion of uniform consistency of estimators of 
linearly ordered parameters. 

Section 2 deals with one aspect of the problem of minimizing a function of 
several variables, subject to side conditions which specify that the variables 
must satisfy certain inequalities. It is frequently true in such problems that 
information as to which of the restricting sets contain the minimizing point on 
their boundaries is of great assistance in finding this point. Theorem 2.1 provides 
the basis for a stepwise procedure leading to this information when both the 
function to be minimized and the restricting sets are convex. It makes no con- 
tribution, however, to the problem of finding the minimizing point on a given 
boundary or intersection of boundaries. 


Brief mention is made in Section 3 of some examples of estimation problems 
for which the remark to which Section 2 is devoted is appropriate. 


Section 4 is concerned with a situation in which samples are taken from k 
populations, each known to belong to a given one-parameter “exponential 
family”. The problem is the maximum likelihood estimation of the k parameters 
determining the populations, subject to certain restrictions. Methods are dis- 
cussed of finding the minimizing point on a given intersection of boundaries of 
restricting sets. In the particular case when all populations belong to the same 
exponential family and when the restrictions on the parameters are order re- 
strictions, it is observed that the maximum likelihood ‘estimators (MLE’s) of 
the means are independent of the particular exponential family. 

In Section 5 is discussed a property, related to sufficiency, of the MLE’s dis- 
cussed in Section 4. Let y denote a vector representing a set of possible values 
of the MLE’s, EF a Borel subset of the sample space, 7 a parameter point, So the 
intersection of the restricting sets. If Sp is bounded by hyperplanes, there is a 
determination of the conditional probability pr(£ | y) which is independent of 
r waen y is interior to S,, and, when y lies on a face, edge, or vertex of Sp, 
is independent of 7 on the closure of that face, edge, or vertex. This result may 


Received September 10, 1956; revised December 16, 1957. 

‘This research was supported by the United States Air Force through the Air Force 
Office of Scientific Research of the Air Research and Development Command under con- 
tract No. AF 18(600)-1108 


437 





438 H. D. BRUNK 


be regarded as a generalization of a remark ({16], p. 77) to the effect that if X 
and Y are normally distributed random variables with unit standard deviation 
and means £ and 7 respectively, and if — and » are known to satisfy a linear 
equation, then the foot of the perpendicular from the observation point (z, y) 
to the line is a sufficient estimator. 

Section 6 is devoted to the same problem as are Sections 4 and 5, except that 
the parameters are linearly ordered, and that the populations need not belong 
to exponential families. Conditions are obtained for the strong uniform con- 
sistency of an estimator which is the MLE when the populations do belong to 
the same exponential family. An asymptotic lower bound is given for the proba- 
bility of achieving a given precision uniformly. 


2. Minimizing a convex function on the intersection of closed convex sets. 
(The author’s thanks are due the referee, whose suggestions have materially 
improved the exposition in this section.) Let y = (y1, ye, --- , ye) denote the 
generic point of R, , Euclidean space of k dimensions, and let G(y) be a lower 
semi-continuous function such that {y : G(y) S a} is bounded for each a, satis- 
fying 


(2.1) Gly’ + (1 — A)y”] S max [G(y’), G(y”)] 


for 0 S XA S 1, and for all y’, y” in its (convex) domain of definition. (This 
form of condition (2.1) is due to the referee.) In particular, G satisfies (2.1) if 
G is convex. 

For an arbitrary set A C R, , let ®(A) denote its boundary. We write A C B 
if A is properly contained in B or if A = B. Let ¢ denote the empty set. Let 
there be given a finite number of intersecting closed convex sets A; 
(i = 1, 2,---, N). We assume G defined on a convex set containing UljA 
We define Q; to be the set on which G(y) achieves its minimum value for y € A;, 
7=1,2,---,N. Fora set 1, 7, --- , 2, of distinct positive integers not greater 
than N we define Q;,.;,,....;, to be the set on which G(y) achieves its minimum 
value for y ¢ Ay, Ai,...Ai,. 

THEOREM 2.1. Let A,;, As be intersecting closed sets, A, conver. Then either 
Qiz C Qi or QiwB(A2) & ¢. 

Proor. If Q:;A2 * @ then obviously Q; D Qe. It remains to consider the 
situation in which Q,A, = ¢. Let p € Qi, ¢ € Qu. Since A; is convex, the segment 
pq lies in A, . Since pz Ag, q € Az, there isa point r on pq such that r ¢ Ay@(Az2). 
By property (2.1), G(r) S G(q), hence r e Qi... This completes the proof of Theorem 
2.1. 

Corotuary 2.1. If G(y) is lower semi-continuous, if {y : G(y) Sa } ts bounded 
for each a and if G satisfies 


(2.2 Giry’ + (1 — dy] < max [G(y’), G(y”)] 


for0 <> < 1, and for all y’, y” in its (convex) domain of definition, and if Ai, 
Az are intersecting closed convex sets in its domain of definition, then Q; and Q1 
consist of single points, q, and qu ; either q. = qu or qi € @(A2). We note that 
a strictly convex function G satisfies (2.2). 
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Corollary 2.1 justifies the procedure outlined in the following paragraph for 
minimizing G subject to the condition y ¢ A;A2,--- , Aw, where A,, A2,---, 
Ay are given intersecting closed convex sets. In many particular instances of 
this problem, one of the chief difficulties is that of determining which of the sets 
A; contain the solution (a point minimizing G) on their boundaries, when the 
point at which G attains its unrestricted minimum is not in A,A2,---, Aw. 
The procedure described below can be used to determine those sets among A, , 
As, ---, Ayn on whose boundaries the solution lies. We remark that G need not 
be convex in order for the method to apply, provided it is lower semi-continuous 
and satisfies (2.2). 

The first step is to determine the point at which G assumes its unrestricted 
minimum. If this point lies in A,A2,--- , Aw, it is the solution. If not, one of 
the sets is selected in which it does not lie, and designated as A, (relabelling, if 
necessary ). Now consider the problem of minimizing G subject to y ¢ A; . Apply- 
ing Corollary 2.1, with A; there replaced by the whole space in this application, 
and A, there by A; in this application, we find that the solution, q , lies on 
@(A,). It may be that q lies in A,A;2, --- , Ay, in which case it is the solution. 
If not, we designate as A: (relabelling, if necessary) one of the sets which does 
not contain g, . We now consider the problem of minimizing G subject to y ¢ A; Az. 
By Corollary 2.1, the solution qi. lies on @(A2). We find first the point gq. where 
G is minimized subject to y ¢ Az. If gq: ¢ AiA2, then q@ = gy is the solution 
of the present limited problem. Otherwise, by another application of Corollary 
2.1, qi € B(A;)B(Azg), ete. 

This stepwise procedure was introduced in situations involving certain func- 
tions G and convex sets A, described by inequalities of the form y; < y% by 
van Eeden ({13], Theorem I, p. 445; [14], Theorem II, p. 134). The stepwise pro- 
cedure outlined above makes no contribution to the problem of finding the point 
where G is minimized on a given “extended hyperface” @(A;,) --- @(A;,,). 
Further, in special cases it may even occur that one will determine the mini- 
mizing point on each of the 2*-1 “extended hyperfaces” before finding a mini- 
mizing point in A,A;, --- , Ay. Usually, however, one wilt expect the procedure 
to terminate with the solution long before all “extended hyperfaces’”’ have been 
examined. 

Non-linear programming methods have been developed for solving certain 
problems of this class (see, for example, [3]). Problems arising from some of the 
applications discussed below are such that it is relatively easy to find the mini- 
mizing point on a given “extended hyperface’’, and some trial calculations with 
such problems using the above stepwise procedure resulted in far less lengthy 
calculations than did those using general nonlinear programming methods. 


3. Examples. 
(i) In the bioassay type of problem, one is required to minimize a convex 
function of the form 


(3.1) —> fa; log y; + b; log (1 — yd], 


t=1 
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where the a; and b; are given numbers, and the y; are subject to the restriction 
0s ” Ss Y2Sere Syvs te 


Even if one is not willing to assume a particular form for the distribution func- 
tion and is thus led to this nonparametric formulation, he may feel that, for 
example, the distribution function should not rise too rapidly, and be led to 
impose further conditions of the form 


fg < 7 ‘ , 
(a2 Yinel —~ Yi s cj Ol Yi+2 mer ZY i41 +. we * d 


where the ¢; and d; are prescribed numbers. The problem remains in the class 
discussed in Section 2; however, the minimizing point on the boundary of a set 
described by inequalities of the form (3.2) is not in general so easily found as is 
that on a boundary y; = yi41. The fact that the partial derivatives of the func- 
tion (3.1) are so readily determined suggests that the method of Lagrange’s 
multipliers, together with Newton’s (multivariate) method for solution of 
simultaneous equations may prove appropriate. 


A similar but simpler problem might conceivably arise in connection with 


ordinary random sampling. Let 2 ,---,2, be sample values of a sample of 
size n from a population with unknown distribution function F, and let 
Pi, P2, °° » Pn be the salti or jumps of F at the sample values. The MLE’s of 
Pi, P2,°** » Pn Maximize II p; Or minimize > log p; subject to the re- 
striction > ent pi = 1, and are given by p; = 1/n,i = 1, 2, --- , n, furnishing 
the empiric distribution function. But now if we suppose further conditions put 
on F, perhaps of the form F(x.41) — F(x,) S c(aisi1 — 2;) or pi S c(ti41 — 2,), 
i = 1,2,---,mn — 1, the remark of Section 2 may prove useful. 

(ii) In the example on page 833 in [6], one is given {a,;}, {n;;}, and required 
to choose {p;;} so as to minimize 

; log piy + (ni; ;) log A — 


Pij/ 


Here p;; = 1 — F(a;, y;), where z;,71 = 1,2,---,n,andy;,j = 1,2,---,k 
are given, and where F(z, y) is an unknown bivariate distribution function, so 
that not only is it required to be monotone in the two variables separately, but 
also second differences are to be positive. 

(iii) Let a person chosen at random from a group have a probability U of 
contracting a certain disease in unit time; U is to be considered a random varia- 
ble, with distribution function FP. If a particular person has probability uw , then 
the probability that he will be infected for the first time during a second unit of 
time is (1 — w)wo , infected for the first time during a third is (1 — w)*u , ete. 
Thus the probability that a person chosen at random will become infected 
during the first unit of time is 


rl a 


| udF(u) = | (1 —e ') dG. 
0 


/0 J 
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where G(t) = F(1 — e ’); the probability that he will first become infected during 
the jth unit of time is pj; = fo(1 — u)’?' udF(u) = fo 7&2"! (1 — &€') dG), 
J 1,2, --- . If we set q; = fo e* dG(t),j = 0,1,2, --- , then pj = qj — q;, 
j ‘2 , and q, — j<1 Pi, j = 1, 2,--- . SinceG is a distribution 
function, we have 


Aq = 441-45 = —Dinr SO 


Ajq = @j+2 4j41 qi = —(p +2 — Djri) = QO, etc. 
Suppose that of n persons initially chosen at random, z; first become infected 
° . ‘ “ ° ° - k 2 
during the jth unit of time, 7 = 1, 2, ---, k, and that m4; = n — dja x; fail 
to become infected during the first k units of time. The MLE’s of the probabili- 


ties p; (j = 1,2, ---,k) andl — > 1 p; are the solutions y; , ye, -*- , Ye» Yeo 
of the following problem: to minimize 


aa x; log y 


subject tu 


p+ I, 
> — |, 


. . - - ‘ j. ¢ 
lero ~ Sosen + 9 BO: ,2,-*>,& — 2, ete. 


The problem may be made to fit precisely the pattern of Section 2 if we replace 


»9 


ey bv 


> y; £1; 


the altered problem clearly has the same solution. 


4. Exponential families. The remark to which Section 2 is devoted is especially 
appropriate for the problem of estimating parameters using samples from 
populations belonging to exponential families (cf. |2]: |4]; |17], pp. 64, 68; |24]); 
more particularly, when the restrictions on the parameter point are expressed 
by inequalities which are linear in its coordinates. 

Let F(x) be a distribution function. The integral 


ax 


o(r) = | e dF(z), 


giving its moment-generating function, converges to 1 for 7 = 0; we shall suppose 
its interval of convergence contains the origin as an interior point. It then con- 
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verges in a vertical strip of the complex 7 plane containing the origin to an 
analytic function which is positive on the real axis. We set 


O(r) = log ¢(r), 


using the principal value of the logarithm (which is real when ¢(r) > 0, hence 
when 7 is real); @(7) is analytic for real 7 in the interval of convergence. 

DeriniTion. The distribution functions F(x; +) form an exponential family, 
the family of exponential type determined by F(x) or by O(r), if, for r in the 
interval of convergence, 


(4.1) F(z; 7) = exp [ur — O(r)] dF(u). 
It was shown by Koopman [20] and by Pitman [23] that, except for change in 
variable or change in parameter, a (sufficiently regular) one-parameter family 
of distributions over a common, fixed (possibly infinite) interval admits a suffi- 
cient statistic only if the parameter enters as does the parameter + in (4.1). 
Further, it is clear from the derivation in [11] of the Cramer-Rao inequality that 
in the above statement the term “sufficient”? may be replaced by “‘efficient”’ 
(as defined in [11)). 

If X, isa random variable whose distribution function F(z; 7) is given by (4.1), 
then its expectation and variance are given by 


(4.2) E(X,) = 0(r), V(X,) = 6(r), 

where 

(4.3) 6(r) = O'(r). 

Since V(X,) = 0 it follows that @(r) is increasing and ©O(r) convex; indeed, 
6(r) is strictly increasing, and ©(r) strictly convex unless F(x) is degenerate, a 
possibility we shall rule out from further consideration. 


We define 7(6) as the inverse function of 6(r), and T(@) by 


6 


T(6) = | r(v) de, 


“65 


where 6) = 6(0). Evidently 7(6@) is convex, and assumes its minimum value, 0, 
at %. According to an inequality of W. H. Young ({18], p. 111), we have 


(4.4) T(x) + O(y) — zy 2 0, 


with equality holding if and only if y = r(x) (a = 6(y)). This becomes geo- 
metrically obvious on interpreting 7 and © relative to the graph of y = r(z) 
or x = 6(y) in the zy plane. 

We note that (i) a normal distribution with variable mean and fixed standard 
deviation, (ii) a Poisson distribution with variable mean, (iii) the distribution 
of the square of a normally distributed random variable having zero mean and 
variable variance, (iv) a binomial distribution with variable mean, and (v) a 
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negative binomial distribution with variable parameter p, are examples of ex- 
ponential families. If Xo is any random variable whose moment generating 
function exists on an open interval containing the origin, there is an exponential 
family of distributions admitting the distribution of Xo as a member for one 
parameter value. In random sampling from a population of this family, the 
sample mean is the MLE of FE(X,) (cf. discussion of (4.6) below); it is also the 
least squares estimator; it is unbiased, consistent, sufficient, and efficient. 

Let us now consider an estimation problem. Let k be a positive integer. For 
i = 1, 2,---,k, consider a population whose distribution belongs to the expo- 
nential family determined by a given distribution function F,(z) for a particular 
parameter value 7; , regarded as unknown. Let z; = (21,4, 224, °°* » Un;.i) de- 
note the set of sample values of a sample of size n; from the ith population, and 
set z = (2,,--:, 2). Let Z, denote the sample mean (i = 1, 2, --- , k), and 
let denote the point (Z, , Z,--- , %) in the Euclidean space R, of k dimen- 
sions. If F is an event in the sample space, its probability is given by 


, ( ) 
(4.5 P,(E) = | exp < a n.[Z; T= ©,(7;)]> dP,(z), 
JE i=] 

where 


P(E) = | IL] aFitz;.0. 


“BEB i=l j=l 


Set r = (71, +--+, ve), ¥ = (y1, °°: » Ys). The MLE of 7 is that point y = +* 
° ° ° k oe e ° . ° 
which maximizes > >m ni{Zy: — O.(y,)|; or equivalently, which minimizes 


k 
(4.6) Gly) = d n{T.(z,) + Oly) — & yi. 

i=l 
This function is convex in y. It is clear from inequality (4.4) and the remark 
following it that the unrestricted minimum is afforded by r* = (ri, 72, --- , 78); 
where 7; = 7;(Z,) (the special case k = 1 was mentioned above). Suppose that 
restrictions on 7 may be expressed by r € A1A2 --- Aw, where A; is the closure 
of an open convex subset of Ri(i = 1, 2,--- , N). We consider now the sub- 
problem of minimizing G on a given intersection of boundaries of some of the 
sets A,;. Assuming the boundaries of the sets A, sufficiently regular, if the un- 
restricted minimum of G isattained outside A; , then the point r* = (7, --- . rz) 
at which G assumes its minimum on @(A;) satisfies 


zk 
(4.7 Dd at(r*)n,[o? — %] = 0, 2,°--,k—1, 

yen] 
where, for r = 1, 2,--- ,k — 1, a (r*) = [ai(r*), --- , ai(r*)] is one of k — 1 
independent vectors tangent at r* to @(A;), and where 6; = 6,(r7). Similarly, 
the condition that G assume its minimum on an “edge” @(A,,)@(A;,) --- @(A,,) 
is (4.7) for r = 1, 2,---, k — n, where a’(r*) is one of k — n independent 
vectors tangent at r* to @(4;,)@(Ai,) --- @(A;,). Thus the point minimizing G 
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on a given boundary or intersection of boundaries is a solution of equations of 
form (4.7). If, in particular, the boundaries @(A,) are all hyperplanes, then the 
a; are constant on a given intersection of boundaries, and values 67 of the means 
corresponding to the coordinates 77 of the minimizing point are solutions of 
linear equations of the form 


z ai no: — z,] = 0. 


i=] 


If the restricting conditions require that 6 = (@:,---, 6), rather than r, 
belong to the intersection of closed convex sets, their maps in the r-space need 
not in general be convex, and the above discussion need not apply. There are a 
number of situations of interest, however, in which the above technique for 
finding the minimizing point on a given intersection of boundaries will still be 
applicable. 

(i) The function of 6, G[r(6)], obtained by replacing y in (4.6) by 7(@), may 
be convex in 6. For example, this will be the case if each population is normal 
with known variance, or binomial, or Poisson. Since the transformation from 
§-space to r-space is 1-1 and analytic, the above discussion for finding the 
minimizing point in 7-space will apply even though the restricting sets in r-space 
may not be convex. 

(ii) All populations belong to the same exponential family, and only order 
restrictions are made on the parameters; that is, the regions A, are defined by 
inequalities of the form 6, < @,. In this case O(r) = 9,(7) is independent of 2, 
and 7, S 7, if and only if 6, = @(7,) S O(7,) = 4, ; since @(r) and 7(@) are 
strictly increasing. The independent vectors a’ for a given “edge” in this case 
are determined by the indices 7 of the boundaries intersecting in the edge, inde- 
pendently of the particular function. The MLE (cf. Section 6 for a specific 
description in a special case) of @ is therefore independent of the particular erpo- 
nential family to which the populations belong, provided they all belong to the same 
exponential family, and provided only order restrictions are made on the parameters 
6;,7 = 1, 2,---,k. In particular, for the purpose of determining the MLE’s 
of the means, one could in such a situation assume without loss of generality 
that the populations are all normal with standard deviation 1, but with possibly 
different means, satisfying the specified order restrictions. (In the special case 
where the order restrictions specify a simple ordering of the means, the failure 
of the MLE’s to depend on the particular exponential family was noted in [6] 
and in [7]). Thus in this situation the problem of finding the MLE reduces to 
that of minimizing the function 


2 ni(%; -- 6,)° 


subject to specified restrictions of the form 6, < 6,. With an obvious linear 
change of variable, it can be expressed as the problem of finding the foot of the 
segment of smallest length from a given point onto a set bounded by hyperplanes 
passing through the origin. 
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5. A sufficiency property. Let us consider for a moment the simplest case of 
estimating a restricted parameter. We sample from a single population, belonging 
to an exponential family. The parameter 6 is known to lie in a proper subinterval 
of its natural range. The MLE, Z, of the unrestricted parameter is known to be 
consistent, efficient, sufficient, and unbiased. It seems to the author that a 
“‘reasonable”’ estimator of the restricted parameter is Z, appropriately truncated, 
which is also the MLE. This estimator is not sufficient (nor unbiased). Likewise, 
in the more general situation discussed in Section 4, the MLE is not sufficient. 
However, it does possess a certain “sufficiency-like”’ property, expressed in 
Theorem 5.1. Referring to the general problem formulated in Section 4, we 
suppose that the parameter point 7 (71, °°°, Te) 1S subject to the restriction 
ré So = AiA2,-+: , Aw, where now each A; is a closed set bounded by a hyper- 
plane. (In the event that all populations are normal with the same standard 
deviation, or that all populations belong to the same exponential family and the 
equation of the boundary of each A; is of the form 7, S +, ; the corresponding 
sets in @-space will also be bounded by hyperplanes.) Let z denote a point of 
the sample space, and let Y(z) [Yi(z), Yo(z), --- , Y(z)] denote the corre- 
sponding MLE of +, subject to r « So. For a Borel set E in the sample space, 
let p,(E | y) denote the conditional probability of F for a given value y (in So 
of Y(z). That is, p,(F | y) is to be defined so that for each Borel set B C S, 
we have 


3.1 P(E an Y(B)) = | pAE y) dP, Y(y), 
B 


where P,() is given by (4.5) for each event £ in the sample space, where Y “(B) 
denotes the inverse image of B under the map ) from the sample space into 
So, and where P,Y~'(B) = P,[Y~"(B)| 

THEOREM 5.1. Let So be bounded by hyperplanes. There is a determination of 
p,(E | y) which is independent of + when y ts interior to So , and, when y lies interior 
to a (k — 1)-dimensional face or (k — j)-dimensional (j = 2, 3, ---, k) edge or 
ertex of So , is independent of z on the closure of that face, edge, or vertex. 

Proor. For x in 6-space, define y(x) by y(x) = (71(21), r2(x2), -+- , Te(Xe)). 
For z in the sample space, define V(z) = y(Z). We have Y(z) = V(z) if y(Z) ¢ So. 
Define q(E | 1) to be the conditional probability of E given a value y of V(z); 
this conditional probability may be taken to be independent of 7, since V(z) is 
a sufficient estimator of r. For y interior to So , we define 


p(E|y) = q(E\ y), forallr eS 


Then if B is interior to Sp , and if r e So, we have 


P(Ean Y(B)) = P(EnV"(B)) = | q(E\ y) dP, Vy 


“B 
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Now suppose y is on a (kK — 1)-dimensional face or (k — j)-dimensional (7 

2,3, +--+, k) edge, W, of So, which is open in its relative topology. For r not on 
the closure, W*', of W, let p,(E | y) denote any determination of the conditional 
probability satisfying (5.1). Choose a fixed 8 ¢ W, and let ps(E | y) denote any 
determination of the conditional probability satisfying (5.1). For 7 on W“, define 
p.(E | y) to be equal to ps(F | y). We now wish to verify that p so defined sat- 
isfies (5.1) when B is a Borel subset of W./For such B we have, by definition, 


, (ck 
PE n Y"(B)| = | exp 4 > nz; 6; — aa} dP,(z) 
(i=l 


“Eny—!(B) 


= | ps(Ely) dPs ¥(,). 
/B 
Also 
. ( k \ 
PJE a Y~(B)] = | exp< >. n; €:(7; — 8) — n{O(r) — Q@,(8,)]>. 


/ Eny—!(B) (i=l ) 


{fk 5) 
exp) ni[Z; Bi — 9.(8,)] dP(z). 
i=l 

If the MLE, Y, of 7 is in W, then, by (4.7), 

k k 

7 ayn ti = Z ain; 6:(Y;), 

t=1 i=l 
where the a’ are independent vectors spanning W. If 7 « W", then r — Bisa 


linear combination of the a’; hence 


Zz. ni éi(ri — Bi) = z n; 9:(Y;) (ri — B:), 


i=l t=1 


a function of Y for fixed 7, 8. So also, then, is 
fk ‘ 
exp. >> n: E(r; — Bi) — [0,(r,) — @,(8,)]> 


(i=l 


a function, ¥[Y(z)], of Y(z). We have then 


P,[E n Y7*(B)] 


ll 


[ A ¥@] aPy(e) 
J Eny~'(B) 


ll 


| vorpsE |») €Ps YW) 


e 


I ps(E|\ y) aP, Y(y), 


since 


dP,Y~'(y) = ¥(y) dPsY“(y). 
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One now verifies from the appropriate definitions above that (5.1) holds for 
arbitrary reSo , and Borel set B C Sy . This completes the proof of Theorem 5.1. 

Theorem 5.1 may be regarded as a generalization of a remark ({16}, p. 77 
to the effect that if XY and Y are normally distributed random variables with 
unit standard deviation and means é and 7 respectively, and if and » are known 
to satisfy a linear equation, then the foot of the perpendicular from the observa- 
tion point (z, y) is a sufficient estimator. 

Theorem 5.1 may be interpreted somewhat as follows. Given the value of 
Y(z), the exact knowledge of the observed sample point would imply no ad- 
ditional information as to how to select 7 on the face (or edge) on which Y(z) 
lies, since the conditional distribution, given Y(z), is independent of + on this 


face. 


6. Uniform consistency of a class of estimators. In Section 4, an estimation 
problem of the following kind was considered. Let k be a positive integer. To 
each positive integer 7 S /k corresponds a population whose distribution is known 
except for the unknown value of its mean, 6; . The means 6; are known to satisfy 
certain inequalities. The problem of estimating a distribution function from all- 
or-none data (bioassay) is of this kind, in which the populations are binomial and 
the inequalities are of the form 6, S #2 S --- S & (ef. Section 3; also [1], [12]). 


Even if the populations are not binomial, but all belong to a common exponential 
family, the MLE’s subject to #6; S # S --- S 6 are very easily determined, 
as follows (cf [1], [7]). Let Z; denote the sample mean of a sample of size n; from 
the i-th population, whose mean is 6,7 = 1,2,---,kK1fiS&%S--- 4; , 
these are the MLE’s of the parameters 6;, 7 = 1, 2. --+ ,k. If for somez we 
have #; > Zis:, these two means are replaced by the single ratio 
(nti + NisiFia1) / (ni + nisi) , obtaining an ordered set of only k. — 1 ratios 
(k — 2 of which are sample means). This procedure is repeated until an ordered 
set of ratios is obtained which are monotone non-decreasing. Then for each 7, 
the MLE, 6;, of 6; is equal to that one of the final set of ratios to which the 
original ratio Z; contributed. 

If the number, k, of observation points is held fixed, while the number of ob- 
servations at each point increases indefinitely, classical theory assures the strong 
consistency of the 6, and yields their asymptotic distribution; the 6 will asymptoti- 
cally coincide with the sample means. We shall be interested here chiefly in 
situations in which there are a large number of observation points, but only a 
few observations, perhaps only one, at each. In [1] and in [7] the local consistency 
of the MLE’s is proved. It is assumed that there is an unknown function 6(¢ 
(as in bioassay, for —: known to be non-decreasing and continuous, 
such that 6; = 6(t;),7 = 1,2, --- , k. Then if ¢ is held fixed, one can achieve an 
arbitrarily high probability of an » naa great precision at ¢ by selecting 
enough observation points in the neighborhood of ¢, even if only one observation 
is made at each. In [1] and [7] it was assumed that the populations all belonged 
to the same exponential family ; but it is clear that the estimators 6 can be formed 
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without regard to the distributions of the / populations; they are determined 
by the sample means alone (of course, they will not in general be MLE’s) 
Indeed, the proof of the local consistency of the estimators § does not require 
an assumption that the populations belong to an exponential family. 

Theorem 6.2 below gives conditions sufficient for the strong uniform consistency 
of the estimators 6, without assuming the populations belong to an exponential 
family. The proof requires a somewhat strengthened form of the strong law of 
large numbers, which is presented in Theorem 6.1. 


THEOREM 6.1. Let r be a fixed positive number. Let Y; , Yo. --- , be independent 
random variables with E(Y;) = 0, E(\Y\") < «, and 
1) Yr 2r\ pert 
(6.1) DEY: PP)’ < @. 
. 
Corresponding to each positive integer n = 2, let tin, t2.n,°°* , inn be @ permuta- 
tion of the positive integers 1,2, --- , n, obtained by assigning a place to the integer 


n between some two successive integers, or at the beginning, or at the end, of th 
permutation corresponding to the integer n — 1. Define S;, = DUu4Y, 
j = 1,2,---,n. Then 


Pr<lim max : S; = 0> = 1. 


jum 
(ne jel,2.---n I 


INDICATION OF PROOF OF THEOREM 6.1. The situation is more complicated 
than that of the classical strong law, but familiar arguments suffice. For v = 0, 
l, --- , arrange the terms Y, having indices 7 such that 2”" <i < 2” in the 
order given by the permutation for 2”, and let 3(v) denote the family of partial 
sums containing the first of these terms, the sum of the first two, the sum of the 
first three, ete. Now consider partial sums S;,, , jn. For eachn, choosek = k(n) 


so that 2°" < n S 2°. To avoid complicated subscripts, let p = p(n) = 2°. 
Let Z,:, Z2,---, Zep-n denote the random variables Yni:, Ynie,---, Yo; 
written in the order given by the permutation for 2p = 2°. Let U(n) denote 
the family of partial sums: {Z,, Z; + Zo, ---,Z: + Zo + -+- + Zop_n} . For 
fixed j, n, and for vy = 0,1, 2,---,4 — 1, let T, = T,(j, n) denote the sum of 


terms Y; which appear in the sum S;,, and which have indices 7 such that 
2’ <i < 2’. ThenT,¢ 3(v) forv = 0,1,2, ---,k — 1. Let T, = T;(j, n) denote 
the minimal member of 3(/) containing all terms appearing in S;,,, whose indices 7 
satisfy 2“"" < i < 2* (minimal in the sense of containing the fewest possible 


terms). Let U = U(j, n) be the sum of terms appearing in 7, of index greater 
than n; then U ce U(n), and Sj, = >of T, — U. Let U(k) denote the family 
of all sums of the form }°*_. W,, where W, ¢5(v), » = 0, 1, 2,---, k. Let 
V = V(Gj,n) = Doro T,. Then V ¢ U(k) and 

S;.=V-—U 


Let € be positive. Let A, denote the event: {maxo<j<n!Sj.n| > 2°*'e}, B, the 


-~ 
as 


T 3k r ok 7 
event: {maxocj<n|U! S 2°e}, and C, the event: {maxyeyu) | V | > 2°e}. Then 


(6.2) Aur. G Cs (k = k(n)) 
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[t follows from Chung’s inequality ({10], p. 348) and the generalized Kolmogorov 
inequality ((21], p. 265), that 


24 
P(B,) > 1 — al > EUY: *)|/ eer 


where A is a constant depending only on r. From hypothesis (6.1), using 
Kronecker’s lemma, we conclude that 


im | > FUY; *|/ (2*)'*! = 0, 


hence there is a positive integer ko such that P(B,) > 3 forn > po = 2 


Further, A,,.: and B,..; are independent, and if A° denotes the complement of 


A, then forn = po + 2, po + 3,---, wehavethat A$,.19 AZ,i2N --: AZ_inA, 
and B, are independent. It follows from the ‘emma for Events’’, [21], p. 246 
that 


Pin +1 AB, P(e oc1As) 
so that irom (6.2) we have 
Pw A.) & POR L.), 
hence 


P(lim sup A,) < 2 P(lim sup C, 


nx y-2 


or 


Pr{ max S,,| > 2°e for infinitely many n} 


< 2 Pr{}max V. > 2’ for infinitely many »| 
VeU(*) 
Kolmogorov’s method ({19], cf. also [25], p. 202), with Chung’s inequality and 
the generalized Kolmogorov inequality can be used to show that the right hand 
. +: okt / ; ok ok 
member is 0. Since 2 < An(k = k(n), 2 <n = 2°), we have 


Pr< max . S;,' > 4e for infinitely many n> = 0. 
0<jcn 
A standard argument completes the proof. 

We return now to the estimation problem. For 7 = 1, 2, , &, Be is the 
sample mean of a sample of size n; from a population whose mean is @(t;) . It 
is known that 6(¢) is non-decreasing. We are concerned with the estimator 
6(t) obtained as described above. It is given ({1], [7]) by 


, 


6(t) = max min( >> nt.) / = n.) 
t-st #.>0 veer v=—r 
7 : 
= min max (= nz.) / (x n,) 
t,>t t-st ver v= 


(6.3) 
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THEOREM 6.2. Let 6(t) be continuous and non-decreasing on (a, b). Let {sq} 
be a sequence of observation points dense in (a, b). Let one observation be made at 
each point (the observation points need not be distinct). Let the variances of the ob- 
served random variables be bounded. Let 6,(t) denote the estimate of 6(t) based on ob 
servations made at the first n observation points, defined to be constant between ob- 
servation points, and continuous from the left. If ce > a,d < b, then 

Pr} lim max | 6,(t) — a(t) | 0} : 
noo eSt<d 

Proor. The original proof used Theorem 6.1 and « geometrical interpretation 
of 6 due to W. T. Reid [5| which is also used in the proof of Theorem 6.3. It 
required as additional hypothesis that the norm (maximum distance between 
adjacent points of subdivision) of the subdivision of (a, b) formed by the first 
observation points be O(1/n), and required the less restrictive hypothesis (6.1) 
on the variances of the observable random variables. The present proof uses an 
approach suggested by the referee. This proof also could be modified to use the 
hypothesis (6.1) on the variances instead of boundedness, together with a uni- 
formity condition on the distribution of the observation points, but the above 
formulation appears more natural and useful. 

We observe first that if 6,(u;) — @(u;) — O for each u; of a sequence {u,} 
dense in (a, l), then it follows from the monotonicity of 6, and the continuity 
of @ that max-<r<a |6,(4) — @(t)| — 0. Consequently it suffices to show that, for 
each individual / ¢ (a, b), Pr{6,(f) — @(t) + 0} = 1, since it then follows that 
Pr{6,(u;) — @(u;) > O for all u;} = 1, if {u;} is any countable sequence of points 
in (a, b). 

We now prove that for fixed ¢ ¢ (a, b), Pr{6,(t) — 0(t) +O} = 1. It suffices 
to prove that for every « > 0 we have 
(6.4) Pr{lim inf 6,(t) — a(t) = 


hon 


and 


(6.5) Pr{lim sup 6,(¢) — @(t) S e} = 1. 
n-20 
We prove the first; the proof of the second is similar. 

We suppose the sequence {s,;} of observation points chosen, not necessarily 
distinct nor ordered according to increasing index, and an observation Z; made 
at each, so that E(Z;) = 0(s,;). Let oi = V(Z,), the variance of the random 
variable Z; observable at s;. For fixed n, let t; , fa, --- , 4 denote the k = k(n) 
distinct observation points among s; , 82, --- , 8, , arranged in increasing order, 
and let n; denote the number of observations made at ¢; , so that }“f_1 ni = n. 

Let ¢ ¢ (a, b). Given e€ > 0, choose n sufficiently large that there is a ft, < ¢ 
such that |6(¢,) — 0(t)| < e. By (6.3), 


a . s © an ee 
6,(t) — o(t) = min Dir MlZ, 0g t,)] : 
te 


= & seis Ny 


(a(t) =o t,) ). 
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Since 6 is non-decreasing, we have @(t,) < 6(t,) for v = r, hence 


oo ' : cep Nyt» — O(t,) 
(6.6) 6,(t) — 0(t) > min =— — € 

2 ile 
For p 1, 2,---, let s;, denote the pth of the members of the sequence {s,} 
which lie at, or to the right of, ¢,. Consider the sequence of observable random 
variables, centered at means, Z, 6(s;.). The sums = ,n,|(Z, — O(t,)| are 


not successive partial sums of this sequence or of any sequence, since as p in- 
creases new observation points are interspersed among the old. However, in 


applving Theorem 6.1 with Y, Zi, — O(s,,), we find that the ratios 
ds n[% — O(t,)| / Di, n, are just such ratios S;,/n as are considered 
there. We conclude from (6.6) that Pr{lim inf,.. 6,(¢) — @(f) > —e} 1A 


similar argument shows that for e > 0, 


Pr{lim sup 6,,(t) — o(t) < ef = 1, 
non 
whence 
Prflim 6,(t) — e(t) = 0} = 1. 
ns 


Together with the earlier remarks, this completes the proof of Theorem 6.2. 
Theorem 6.3, below, gives an asymptotic lower bound for the probability of 
achieving a given uniform precision on a closed subinterval of (a, b). 
THEOREM 6.3. For a fixed positive integer n, let n observations be made at ob- 
servation points ti S te S --+ S & in (a, b), n; observations being made at 1; , 
+21,2,---,k, sothain = > 1n;. Let A = maxjeo12.--- 2(tian — bs bo = a, 
tes: = b. Let the populations be such as to permit the application of the Central Limit 
Theorem as required in (15) (cf. also [9]; for an appropriate Lindeberg condition, 
see [22], p. 127). Let 6(t) have a bounded derivative, \6’(t)|} = K, K > 0, for t € (a, 
b), and let o* = ><. 1 ny. , where o% is the variance of an observation made al t; , 


) 


i= 1,2,---,k. Forz > 0,leth = [2204 / K},c = a+h,d = b —h. Then 


s sis - 4<— (—1)’ s 22 : 
Pr; max | 6(t) — @(t)| < 2 V2Kz0A\| = - > a exp [—(2v + 1)" /82"] 
e<st<d wT ro 2vy + 1 


The symbol ‘‘=>” is to be interpreted as “asymptotically (as n — ©) at least 
as large as’. The estimate is most nearly accurate if only one observation is 
made at each point, if the observation points are distributed uniformly over 
(a, b), and if 6’(t) is constant. 

Proor. Order the observations according to increasing #, ordering in an arbi- 
trary way those occurring at the same observation point. Let Z,,, denote the 
vth observation, v = 1, 2, --- , n; its mean is 6(¢;) and its variance oj if it is 
made at the observation point /;. For positive integers 7 S k, define 
N; = Dace; m, (N59) = Dense; rH(t), and s*(N;) = Dos<t; mt. We have 
s*(N ,) as one of the partial sums of the sequence Z,., , and s(N;) as its expecta- 
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tion. If S,,, denotes the vth partial sum, and s; its variance, then it is known that 


, hime {$< (—1) , 
lim Pr{max |S,, — E(S,2)| < 2s} = - rd — exp [—(27 + 1)°r'/8z| 
nwen vin T {=f 21 | 











(strictly speaking, we require that the theorem as developed in {15} and [9] 
be generalized so as to apply to sums of the form os X,.,, Where the X,, 
are independent for distinct v, rather than to sums of the form ont X, ; but 
only trivial modifications are required in the proofs). Define s(u) and s*(w) to 
be linear between successive integers; then 


= : — (—1)° —_"* 22, 
(6.7 Pr} max | s*(u) — s(u)| < ze} 2 exp|— (22 + 1)°r /8z']. 


$ 
Osugsn T i=0 21 + ] 
We observe that s(u) is a convex function whose graph consists of line segments: 
for N;_, <u < N, we have s’(u 6(t,),7 = 1,2, ---,k. The graph of s*(u) 
also consists of line segments, but it need not be convex, since #; need not in- 
crease with 7. 

Let g(u) denote the greatest convex function not greater than s*(w); the graph 
of this function consists of line segments. We denote by g’(u) (s’(u)) the derivative 
of g(u) (s(u)) where it is defined, and the left-hand limit of the derivative at a 
corner. One verifies from formulas (6.3) that 6(t,) g(N;), ¢ = 1,2, ---, &. 
Now let w’ be fixed, so that vu’ < p wer n, . Suppose maxo<u<n |s*(u) — su)! < 
zo. Then for u = w’ we have 


g(u) S s*(u) < s(u) + 2e. 


Since the point (w’, g(w’)) is on a line segment whose endpoints are at vertical 
distance less than zo from the graph of s (or else it is itself such an endpoint), 
and since s is convex, we have also 


g(u’) > s(u’) — ze. 


Hence g(u) — g(u’) < s(u) — s(u’) + 2zc. Therefore 








( —_— ale’ s( = o(¢’ D2 
ae q(u) gu’) _ slu s(u’) 2zo 
ju’) sec pe a acres 5 
u—U u— u u— Uu 
Choose i, j so that Nu. < w’ S Ni, Noi < w S N,. We have 
[s(u) — s(u’)] / (u — w’) S s’(u) = O(t;) S Ot) + KCL; — ti) s'(u’) + 


K(t; — ¢,). Butt; —t; S (Nj1 —-N) A+ AS (u — w + 1)A, so that 

g (uw) < s'(w) + K(u — w+ 1A + 220 / (u — w’) 
for u’ <u Sn. We choose u = w’ + [220 / KA)’ , and find that g’(u’) — s‘(w’) < 
2(2KzcA}' + KA = 2(2KzcA}’. Similarly, g’(u’) — s’(u’) > —2[2KzcAl’, if 
u’ => Y<en,. Since 6(t;) = g/(N,) and @(t;) = s'(Ni).7 = 1, 
have 


nm et 


, k, we 


max | 6(t) — 6(t)| < 2[2KzocA}’ 


e<t<d 
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max 
Osusn 


The conclusion of the theorem follows from (6.7) 

If K = 0, or if we wish a lower bound on the probability for uniform precision 
over a larger subinterval [c, d], we must simply take u in the above discussion 
equal to uw’ + h/ A, where h = max [b — d, c — al, obtaining 


(u’) — s’(u’)| S K(h + A) + 2ze h, 


< Kth + A) + 2zcA/h 


To get an idea of the rate of convergence guaranteed with at least a certain 
probability, suppose 6(t) ton (0, 1), and that A = 1 / (n + 1), one binomial 
observation being made at each observation point 7 / (n + 1),7 = 1,2,---,n 
We find o = n/6,K = 1,h = (22 /3n)', and 


> =: 


D,. § V7 9(9.,7 /an\ii 3 = (—1) 2y — 1) 
rr; max |6@(t) — 6(t) | < 2(2z/3n)*} = exp | — “<x - 
e<t<d T yap Lv + | Sz 


which suggests that the minimum precision (reciprocal of error) assured with a 
given probability increases like n*. On the other hand, if the observations are 
concentrated near a given point, Theorem 3.1 of [1] suggests that the precision 
at that point increases like nr’. 
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INTERSECTION REGION CONFIDENCE PROCEDURES WITH AN 
APPLICATION TO THE LOCATION OF THE MAXIMUM IN QUADRATIC 
REGRESSION 


By Davin L. WALLACE 
University of Chicago 

1. Summary. Confidence region procedures for multidimensional quantities 
sometimes require prohibitive amounts of computation and the regions are 
difficult to represent in a useful way. Some approximate procedures are con- 
structed by using regions obtained as the intersection of several regions, each 
much easier to construct. The procedures are applicable to the solution of si- 
multaneous equations, whose coefficients are subject to random error. Approxi- 
mations by convex polyhedra and by parallelepipeds are proposed. The pro- 
cedures are illustrated for setting a confidence region for the location of the 
vertex of a quadratic regression surface. 


2. Confidence regions. In this section, 1 give a subjective evaluation of the 
requirements for a useful confidence region procedure. 

Suppose that \ is a (multidimensional) quantity defined as a function of the 
parameters of the distribution sampled. The problem of constructing confidence 
regions for the true value(s) of \ will be considered. 

\ confidence region and a point estimate for \ are often used to summarize 
the information about \ in the observed sample. Their use is an attempt to 
convey in a comprehensible way some idea of the extent and character of the 
determination of A, taking account of the inaccuracies of measurement. Any use 
of the confidence region in making decisions about further experimentation, 
process operations, etc., will be informal. The exact confidence level is not im- 
portant and even the frequency interpretation of the procedure is not essential, 
both serving principally as “benchmarks” for purposes of comparison and 
familiarity. What is important is that the region be represented geometrically 
or analytically so that the user can comprehend its size, shape and location. 
Approximations to the region which simplify this representation will be valuable 
as long as they do not greatly change the confidence level. 

The theoretical specification of confidence procedures and the investigation of 
their statistical properties (level, power, etc.) are usually accomplished through 
an associated family of tests of hypotheses. The condition that the true quantity 
have the particular value \ is a condition on the parameters and hencea statistical 
hypothesis. Denote it by H,. Then, given a level a test of H, for each value 
of \, the confidence procedure defined by R = {d: Hy not rejected} is an error 
level a confidence procedure. The ‘error level” (= 1 — ‘“‘confidence level’’) 
of a confidence procedure is usually more convenient than the confidence level 
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and coincides with usage in the related testing procedures. (Strictly, there should 
be a notational distinction between a confidence procedure R as a set-valued 
random variable and a confidence region R as the realization for a particular 
sample. However, the meaning of the symbol F# should be clear from the context 
or the verbal distinction between confidence procedure and confidence region.) 

This method does not necessarily give usable confidence regions, even when 
the test of each H) separately is satisfactory. In order that the tests need not 
actually be carried out for each A, some continuity in \ must be required of the 
family of tests. Any (non-randomized) test of H, can be represented (not 
uniquely) by a statistic h(A) where, for any sample z, HM is rejected if and only 
if h(A, z) > 0. If there is a choice of h which is, for a fixed sample z, continuous 
in \, then the confidence region R is a closed set with boundary equa- 
tion h(A, z) = 0. 

However, continuity of h(A) is not generally enough. If \ is one dimensional; 
a useful confidence region is usually an interval. A solution providing the limits 
of the interval is satisfactory, but one providing only a complex equation h(A) = 0 
for the limits may not be. 

When d is multidimensional, the problems of computation and representation 
are greatly magnified. The boundary equation will likely increase in complexity 
rapidly with increasing dimension. But more serious is the difficulty of repre- 
senting the region even when h(A) is given explicitly in terms of simple functions. 
The boundary can be plotted in two dimensions, as can cross sections in more 
than two dimensions, though with effectiveness decreasing with increasing di- 
mension. A principal difficulty is that few shapes are readily visualized in more 
than two dimensions, or, what is more essential, that comprehension of a region 
from the equation of its boundary is restricted to very simple surfaces. 

The simplest regions are the parallelepipeds which can be completely de- 
scribed by giving limits on each coordinate of a coordinate system related by an 
affine transformation to the original coordinate system, or equivalently, by giving 
p linear double inequalities on the coordinates of X. 

The next simplest regions would seem to be the convex polyhedra. When the 
number of faces is small, the region is simply described by giving the linear 
inequalities corresponding to each face and is only slightly more complex than 
the parallelepiped. As an approximate representation of a region with corners, 
the number of faces is likely too large to permit use of the inequalities and the 
region must be thought of, with greater difficulty and less adequately, in terms 
of the corners (vertices). 

Ellipsoidal confidence regions are important, largely because they occur natu- 
rally in the classical normal theory of means and regression coefficients and also 
in the general large sample confidence theory. They are probably visualized as 
rounded boxes and their description by a center and lengths and directions of 
principal axes corresponds closely to the parallelepiped description. 


3. Geometrical idea of intersection confidence regions. In many multidi- 
mensional confidence problems, interest centers more on the separate coordinates 
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Fic. 1. (a) An ellipse as intersection of straight 


trips. (b) A standard region and a 


f 


curved strip. (c) A standard region as intersection of curved strips 

or on linear combinations of them than on the multidimensional quantity (e.g.. 
the means in an analysis of variance). The usual ellipsoidal confidence region is of 
little value. Scheffé’s [12] multiple comparisons procedure amounts to repre- 
senting the ellipsoid as the intersection of all the slabs between parallel pairs of 
tangent hyperplanes (see figure la). Each slab gives a confidence interval for a 
single linear combination of coordinates. The totality of such intervals has the 
same joint error level as does the ellipsoidal region. The procedure permits 
making as many confidence statements on linear combinations as desired and 
permits the posterior selection of ‘‘most interesting” statements. 

The same representation by slabs is valid and useful for any convex region 
and this is the basis for the multiple comparisons procedure given by Tukey 
[13], Roy and Bose [11] and others. 

Even in problems in which the multidimensional quantity is of principal 
interest, the multiple comparison methods provide a means for approximating 
convex confidence regions by convex polyhedra, regions more easily described 
and visualized. Often, the linear inequalities defining the polyhedron are much 
more easily obtained (computationally) than is the boundary equation of the 
exact region. 

In the small sample theory of more complicated problems (such as the location 
of a regression surface maximum), the standard confidence regions are not 
ellipsoids and may not even be convex, connected, or bounded. There is no 
practical way to determine from a particular boundary equation if the region is 
convex. The intersection region procedures developed here are an attempt 
to construct some usable approximate representations for some of these problems. 

The idea is to approximate a standard region as the intersection of several 
regions each of which is fairly easy to represent and to compute. They are typi- 
cally (in two dimensions) curved strips rather than straight strips (Figures 
lb and 1c). The regions are determined essentially by applying the multiple 
comparisons theory at an earlier stage in the confidence region construction. 
The approximation is carried one stage further in which the curved strips are 
approximated by straight strips and their intersections by convex polyhedra. 





458 DAVID L. WALLACE 


4. Intersection region procedures for families of general linear hypotheses. 
The most important class of quantities amenable to intersection region pro- 
cedures arise from general linear hypotheses in general linear models (cf. Wilks 
[14], Chap. 8). An nm dimensional vector z = X$ + e is observed in which X is a 
known n X m matrix of rank m, $ the unknown m-dimensional vector of ‘re- 
gression’ coefficients and the n components of e are independently and normally 
distributed each with zero mean and variance o°. Least squares estimates of 
8 and o’ are 

b = (X’X)""(X’z) 


7 


[z'z — (z'X)(X’X)~'(X’z)], 
n 


distributed independently as a normal with mean § and covariance matrix 
o (X’X) and as o°x’/(n — m) on v = n — m degrees of freedom. 

Many quantities of interest can be represented as the roots of sets of simul- 
taneous linear equations in the regression coefficients; as the root in \ of the 
equations 


> B; :;(A) = b,o0(A); (¢ = 1,---, p) 
j=l 
linear in the regression coefficients, but of arbitrary though specified form in X. 
Linear combinations of means or regression coefficients are included by choosing 
all 6;;(A) to be constants and taking 5,(A) = A; . Two one-dimensional quantities 
typical of the more complicated problems motivating the intersection procedures 
are: 
(i) B:/Be: 6 = 1, b2 = —Xr,, other 6,;; and 6, zero. 
(ii) location of vertex of regression curve By + Bic + Box” + Bsr: bn 
bye = 2A1, dig = 3X1, Other 5,; and 8,0 zero. 
If \ is the true value of the quantity, it satisfies the equations: 


Hy : > B,6i;(X) = be0(d); (Gj =1,--- 
or written in vector form (with natural definitions): 
Ay ; A$ = Dov 


and is a ‘general linear hypothesis.”” Any procedure for testing H, for every \ 
leads to a confidence region procedure for \. Several procedures will be used. 

Let 6:(4) = >> B,6:;(4) — So(X) and & = [8(A), --- , 5p(A)]’ = ArxS — Bn. 
For each , the least squares estimate of 5, is d, = A,b — 5. dy is normally 
distributed with E(d,) = 8 and Cov (dy) = o°A,(X’X) 7A, = o°V,. (Note 
that d, and V, will generally depend on \ except when A, is a matrix of con- 
stants—as it is for the usual simple problems). 

In all that follows, the observations are used only to compute d, and V, and 


° 2 ° ° one a 
the sample variance s°. V, is assumed nonsingular (and hence positive definite) 
for every X. 
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The likelihood ratio test of the hypothesis H, : & = 0 is: reject H, if T, 2 A 
with test statistic 


When A), is true, 7, has an F distribution with p and » degrees of freedom. 
In general, 7, has a noncentral F distribution with noncentrality parameter 
5,V; ‘d/o’. The test with critical value F,»,2 (the upper 100 a percent point of 
the F' distribution) is a similar level a test and is the uniformly most powerful 
invariant level a test of H, . Throughout the paper, this test will be called the 
“standard” test of the general linear hypothesis. 

The corresponding confidence procedure for \ is the “‘standard”’ level @ pro- 
cedure: Rs = {A: Ty, S F5»:2} which can be written as 


/ 


Rs = <r: ps 


\ 


fk pP:¥,a d) o> 


d, Va 


IV 


or 


(4.1) Rs = {d: Dd dav dhj Vay — ps*F p.v.2 | Va} SO} 
tJ 


in which V),; is the cofactor of the element r»,; in V, . This confidence procedure 
has been constructed and used by Box and Hunter [2]. 

The confidence procedure can be difficult to use especially when the elements 
of V, depend on X. For then the boundary equation and sometimes the region 
itself can be very complicated and the necessary computation messy. 

The intersection region procedure is based on working separately with the p 


single equations 6;(\) = 0 composing Hy or, more conveniently, with linear 
combinations of these equations. When 6,(A) = 8; —\; so that the quantity 


of interest is the vector of regression coefficients, the procedure reduces to the 
multiple comparisons procedure of setting confidence limits on some or all linear 
combinations of the {8;}. 

Let k, , --- , Kk, be any r prescribed p-dimensional vectors, and let H); denote 
the hypothesis k.d, = 0. Every Hj, is true when A) is, and if the vectors {k;} 
span p-space, the truth of all r “component” hypotheses implies that H) is true. 

Suppose each hypothesis H,; were tested according to: reject Hy, if Ty; > A. 
A natural joint test of H) is to reject H, if any AM), is rejected, i.e. if 


(4.2) Uy = max T); > A. 


lsitsr 


Corresponding to each component set of tests is a confidence region R; such 
that 


R, = {\: Tis < A} 
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and the intersection region /?; is defined as the intersection of the r regions {F,}, 


or equivalently as the region defined by the joint test: 
= iA: U) < A} 


Each H), is a linear hypothesis with standard test statistic (which will be used) 


/ 9 
_ (kk; dy)" 


\ = — 


Po go ° 
§ ( k; Vy k; ) 
The component region R, is given by 


Ry = {d: (kid,)? — As°(kiVik,) < 0} 


and is usually much simpler, computationally and geometrically, than the 
standard region Rs. When H) is true, each JT), is distributed as F;,, . The joint 
distribution of the 7), follows, in principle, from the fact that 


(4.3) 


are distributed as a multivariate normal with zero means, (when Ay is true), 
variances o’, and correlations depending on the {k,! and on X. 

The choice of the critical value A must be a compromise between control of 
the error level of procedure, ease of computation, and simplicity of the resulting 
boundary equation. In order that the intersection region procedure have a 
constant error level a, A must be the 100 @ percent point of the distribution 
of U,, the studentized maximum of the squares of correlated normal deviates. 
But except for a few special cases, these percent points cannot now be obtained 
without major computation. And since they would likely depend on \ through 
the correlations, the boundary equations of the intersection region would be 
complicated by the presence of the function A(A). The use of the exact percent 
point, if obtainable, for some single ‘‘“compromise” value of \ might be an ex- 
cellent choice. I assume throughout that a constant (in \) critical value is used. 
Attention here will mainly be restricted to two approximate choices, each 
“conservative” in the sense that the error level of the intersection region pro- 
cedure does not exceed the nominal level a. 

THEOREM 4.1. For any set of prescribed k, , --- , kK, , the confidence region pro- 
cedure R, using critical value A = F\,y.a/r has error level not exceeding a. 

Since U, exceeds A if and only if at least one T); exceeds A, 


P(U, > A) < & P(T; > A). 


i=l 


(This holds generally, without regard for the meaning of 7); and leads to an 
immediate generalization of the theorem for joint tests and intersection pro- 
cedures based on any separate tests of any set of component hypotheses.) When 
Hy is true, every 7); has an F,,, distribution so with A = F,.¢/- , the right hand 
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side is exactly a and the joint test of / has error level not exceeding a. This 
holds for every 4, so the error level of the intersection region procedure is also 
so bounded. 

The actual error level using A = F,5;e/r Will depend on \ through the correla- 
tions of the {kid,}. When these correlations are small, the error level will be 
quite close to a. Some results on the closeness of the bound a are given in section 
eight. 

As the number r of linear combinations is increased, the correlations increase 
and the bound gets worse. The behavior of the intersection region is best studied 
in the limiting case where all linear combinations are used. The distribution 
theory is exactly that used by Scheffé [12] and is based on an algebraic lemma. 

Lema (Schefié). /f d is any p-vector, V any symmetric positive definite p X p 
matrix, then 


sup —— = dvd. 
allk (k Vk) 
THEOREM 4.2. The intersection region procedure using all linear combinations, 
each with crilical value A = pF 55:4, 18 identical to the level a standard region 
proce dure. 
COROLLARY. Any intersection region based on r prescribed combinations {k;! 
and critical value Ao always contains the standard region with error level 


P{F 5. > Ao/p}. 


Applying the lemma to d, and its covariance matrix o'V, and studentizing 
with s°, 
k’d,)’ sd Vid 
uk SVE) Ot 
But the left-hand side is the test statistic associated with the intersection pro- 
cedure (over all k) and the right-hand side is p times the standard test statistic. 

Any intersection procedure can be treated as an approximation to some 
standard procedure. The intersection region will always contain the standard 
region and will converge to it as more linear combinations are used. The gain 
in simplicity of the component regions may more than compensate for the large 
number of regions and the imperfect approximation. 

The use of A = pF,,...¢ has one advantage over all other choices, approximate 
or exact, for a finite r. The distribution theory of the {7},} and related statistics 
is valid only if the vectors {k,} are chosen independently of d, and s. But 
Theorem 4.2 is based on all linear combinations and thus can be used for {k,! 
selected after studying the data. A useful a posteriori choice of linear combina- 
tions will be illustrated in the application in section seven. (This advantage is 
a primary motivation of Scheffé’s [12] multiple comparison, procedure.) 

When the equations defining \ are homogeneous linear functions of the re- 
gression coefficients, confidence regions for \ can have shapes and behavior not 
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occurring in classical confidence regions. Several such properties follow from 
Theorem 4.3. 

THEOREM 4.3. If for all \, each component of 5, is a homogeneous linear func- 
tion of the regression coefficients of the linear model, then for any sample there is a 
nonzero a* which will depend on the sample but not on , such that the standard 
confidence region for \ is the entire space for any error level less than a*. 

Corotuary. The theorem holds, with a possibly different a*, for the intersection 
confidence region. 

The theorem follows easily from the well-known interpretation of the test 
statistic T, of the general linear hypothesis, that 


where So = vs’ is the sum of squares of residuals after the least squares fit of 
the linear model to the data, while S, is the sum of squares of residuals after 
the best least squares fit subject to the restriction of the hypothesis H,:3, = 0 
(cf. Wilks [14]). One of the possible fits satisfying the homogeneous restrictions 
5, = 0 is that with all regression coefficients estimated to be zero, leaving a 
residual sum of squares }- z’, the original sum of squares. Consequently, Sy, < > 2 
for all \ and for any sample, 7, is bounded by a constant depending on the 
sample but not on X. Since F,,,,2 approaches infinity as a approaches zero, for 
the sample z there is a nonzero a*(z) for which AH) will be accepted for all \ at 
any significance level a < a*(z), and the corresponding confidence region will be 
the entire space. The corollary follows using the corollary to Theorem 4.2. 

The theorem shows that the confidence regions need not be bounded. Since 
the value(s) of \ for which 7, is maximum will generally be finite, the confidence 
region as a function of the error level will close in around the maximizing point(s), 
and the resulting region will be neither convex nor simply connected and per- 
haps not even connected. (In the usual problems with means and regression 
coefficients, the hypotheses are not homogeneous and, what is essential, the 
constant term depends on \.) 


5. Geometry of intersection regions for a class of equations linear in \. Further 
study of intersection regions requires specifying the form in A of the defining 
equations. An interesting class of equations is suggested by the problem of 
locating the maximum of a quadratic regression surface (section seven). Sup- 
pose that A is a p-dimensional vector 2 and that the equation 4 = 0 is linear 
and homogeneous in the regression coefficients and linear in 2. Introduce the 
notation §. = y + I in which the elements of the p-vector y and the p X p 
matrix Ir are homogeneous linear functions of the regression coefficients. Let c 
and C be the corresponding least squares estimates of y and f and let d, = 
c + Ca. The covariance matrix o°V) of d, is an inhomogeneous quadratic func- 
tion of 2. The particular forms of y and I are of no interest except for the evalu- 
ation of d, and V, (a tedious but straightforward task) and to verify the two 
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assumptions: Assume that V, is nonsingular for all 2. Assume that the (random) 
matrix C is nonsingular with probability one. 

Then a unique solution 2 (the maximum likelihood estimate of 2) of d, = 0 
will exist. Ff may be singular for a particular set of population regression co- 
efficients, so that the “population value of 2”? need not be unique or even exist. 
(If no solution of 5) 0 exists, the confidence problem is vacuous.) 

The development of this section concerns geometric properties of intersection 
regions. Throughout, the observed sample will be held fixed, arbitrarily, except 
for the above mentioned set of pri /bability zero. 

The component region R,; based on the combination kid, is {2: h(a) Ss 0} 
with quadratic boundary equation h,(%) = (kid,)* — As’(k;V)k,) = 0. Since V) 
is positive definite for all 2, all points on the hyperplane kd, = 0 (call it M;) 
lie in the interior of R,;. The two parts (if they exist) of the exterior 
(complement) of R; on either side of M; are each convex (Theorem 9.1). 

Thus, FR; is the region between the sheets of a two-sheeted hyperboloid, the 
exterior of an ellipsoid, or limiting and transitional forms of these. The boundary 
(call it F,) can never be one of the one-sheeted hyperboloids. 

Considered as a function of the critical value A (or of the significance level), 
R, is the hyperplane M, when A = 0 and expands monotonically with A, first 
as a “curvilinear slab” between the two sheets of a hyperboloid of small curva- 
ture, then widening and curving until it eventually becomes the outside of an 
ellipsoid and finally fills the entire space. By Theorem 4.3, this last will occur 
for a finite A. 

The component region will be most easily described and comprehended when 
it is a “curvilinear slab,’’ being almost a confidence interval for the linear com- 
bination k(d, . In any case, the computations involved in using R; are relatively 
simple. 

The component regions are studied as a preliminary to forming intersection 
regions. In this section, restrict attention to p linearly independent combinations. 
The intersection of p slabs is a parallelepiped. At best, the component regions 
are bounded by hyperboloids of small curvature, and the intersection of p such 
regions is a “curvilinear parallelepiped.”” The 2p “faces”? become concave and 
the 2” corners are moved at least enough that the 2”"' corners on any “face” 
are not coplanar. 

Let R, = MR; be the intersection region determined by k, , --- , k, and let 
J = NF, be the intersection of the boundaries. The points of /—the ‘‘corners” 
of R,; —-are the solutions of the p simultaneous quadratic equations 


(hf{a) = 0,4 = 1,---, p}. 


Except with probability zero, J will contain not more than 2” points. Let R’ 
be the convex closure of J. R' is a convex polyhedron, all of whose vertices are 
points of J (not necessarily conversely). 

In the intuitive discussion above, R; would be contained in R' so that R' 
would be a conservative and perhaps close approximation to R,;. But the ap- 
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proximation depends critically on R; being a “curvilinear parallelepiped.” If 
R; is unbounded, disconnected, or otherwise misshaped, the formal calculation 
of R’ will usually lead to a clearly bad approximation, but it can lead to an 
apparently good but erroneous approximation. Complex shaped intersection 
regions may arise because of poor choices of components (e.g., too highly cor- 
related) or because of the inadequacy of the sample data in accurately deter- 
mining \. The R’ approximation can be so bad that it has no points in common 
with R; except the corners. This will occur if all the corners lie on the part of 
one boundary (say F;) on one side of its hyperplane M, , as will necessarily occur 
if any R; is an ellipsoid (Theorem 9.2). 

A positive result, representing the behavior understood in the intuitive dis- 
cussion is that R' contains R, if there are exactly 2” corners, one in each of the 
2” parts of p-space formed by the p hyperplanes {M;} (Theorem 9.3). 

To use R’, one must find all corners of R;. There is need for some simpler 
approximations, to provide a first approximation for the complicated calcula- 
tions of the corners of R; , and also to be easier to describe and use than is R; . 
Two simple polyhedral approximations R’ and R* are suggested, both based on 
the idea that the boundaries {F;} are hyperboloids of small curvature, at least 
in the region of interest. 

R’ is obtained by replacing each hyperboloid F; by a pair of tangent hyper- 
planes and taking the convex polyhedron formed by these 2p hyperplanes. R° 
is the parallelepiped formed by approximating each F; by a pair of hyperplanes, 
both parallel to the corresponding M;. These approximations are easily deter- 
mined once the point of tangency or intersection on each boundary is chosen. 
The suggested choice is the pair of points lying also on every M;,j = 17. If, 
for each 7, the two points lie on opposite sides of M; (the reverse is a sure indi- 
cation that R, is not a “curvilinear parallelepiped’’), R® is always contained in 
R, (Theorem 9.4). If the hyperboloids are nearly flat, the approximation is good. 
By dilating the region until all corners are outside R;, a ‘‘conservative” ap- 
proximation of the same convenient shape could be obtained. R’ or a dilation is 
probably the most useful of all the suggested regions. R* is a much rougher approxi- 
mation whose virtue is simplicity of shape and computation. 


6. On the computation of intersection regions and approximations for equa- 
tions linear in 2. The computations needed in the approximations of section 
five are simplified by a change of coordinates. For a particular choice of 
k,,---, k,, and a particular sample, define new (oblique) coordinates for 
the space of & by 


' 
b ¥ = kid) ; 


The inverse transformation must be obtained, requiring the solution of p simul- 
taneous linear equations in {\;}. (The unique solution is guaranteed by the 
assumed linear independence of the {k;} and nonsingularity of C.) 


+ 


The coordinate hyperplane £; = 0 is M; and the maximum likelihood estimate 
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% is at the origin = 0. Each h,(%) can be written in & coordinates as 


h(a) = gi) = &§ — As*(eiVaki) 
and is a quadratic function of the {&;}. 

The computation of R* and R* is immensely simplified. The pair of points 
lying on F, and every M;, j # 7 have all — coordinates but the ith zero, and 
that given by the roots (e; < e,) of the quadratic g,(0, --- ,0,& ,0,--- ,0) = 0. 
If they have the same sign, the intersection region, at least for the particular 
choice of components, is dangerously complicated. If e; < 0 < e; , then the 


equation of the tangent hyperplane approximation to the part of *; with &; > 0 
(say) is 


2. . | dg. (8) wy. | 0g;(8) 
o= ys [2@] ( ety: | 90 
j=l 0g; E=m(0,+++,07,+* +40) dé; E=(0,-++,e7,-++,0). 


' 
Jt 


and is easily written down when g,(&) is given explicitly. R° is defined by the 
2p corresponding inequalities, each chosen so that £ = 0 satisfies it. R’ is given 
as {E: ¢<& <e;t=1,-::, p}. The inequalities for R’ or R* are easily 
changed back to linear inequalities in % if desired. 

The R’, R’ and R’* constructions have no unique or natural extensions to more 
than p component regions (in p dimensions). One possible procedure would be 
to repeat the R* construction for another set of p components (perhaps with 
some overlaps), using a new set of coordinates, then converting both sets of in- 
equalities to some one convenient coordinate system. The approximate region 
would then be the convex polyhedron defined by all of the inequalities. Some 
dilation of the R’ region as discussed in section five would be desirable to prevent 
serious underapproximation of the confidence region. 


7. An application to the location of the maximum of a quadratic regression 
surface. Many aspects of the problem of determining the values of the input 
variables of a process to yield a maximum response have been studied by Box 
and colleagues ({1]}, [2], [3], [4], [5]). Here we use the simple model in which each 
observed response z is distributed normally and independently with variance o° 
and mean 


(7.1) E(z) = yo + xy + dy'Ty 
with input variables y’ = (y,,-+-- , yp) and regression coefficients 
ih Ye); 


with yi; = yji. nm sets of (z, y) comprise the observed data and the model is a 
“general linear model” of section four with m = 1 + p + p(p+1)/2 regression 
coefficients. Denote by co, ¢, C and s° the least squares estimates of yo, y, T° 
and o°. The fitted surface is 


(7.2) = + c’y + 4y’Cy. 
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The estimates and their variances and covariances can be computed from the 
formulas of section four or from those given by Box and Wilson [4]. 

If the surface (7.1) has a maximum at y = 4, 2 will satisfy the stationarity 
equation H,: y + Fa = 0. This equation will be satisfied by any vertex 
maximum, minimum or saddle point—of the surface and all confidence regions 
are for the location of a vertex, type unspecified. Box and Hunter [2] construct 
the standard confidence region and show that if the region is bounded, then the 
region can be said to represent one particular type of vertex in the sense that 
for every 4 in the region and for each fitted surface with vertex 2 that does not 
give a “significantly poor fit,”’ the vertex is of the same type. Their argument ex- 
tends to intersection regions based on joint tests of the form of equation (4.2.). 

From the structure of d, = c + C2 it follows that even with the maximum 
of balance and symmetry in the design, each diagonal element of the covariance 
matrix V, of d, has at least a constant term and all p square terms. Each off- 
diagonal element has at least a term in \,A; . Consequently, in the equation (4.1) 
for the standard confidence region Rs , each term will generally be a polynomial 
in the {A;} of degree 2p. Even for p = 2, the equation is already unwieldly. 
Box and Hunter [3] show that with a rotatable design, the equation can be 
reduced to a quartic for any p. 

H, is exactly of the form studied in section five and the nonsingularity as- 
sumptions for V, and C are met. Intersection region procedures are applicable. 
The simplest choice of linear combinations is the direct use of the p components 
of d, . With this choice, the critical value A = F;,.«;, could be used to give an 
error level bounded by a. 

A better choice is suggested by a canonical analysis of the fitted surface such 
as is obtained by introducing new coordinates {2z,;} with origin at the center and 
with axes the principal axes of the fitted surface (7.2). If {m;} and {b,,;} are the 
eigenvectors and corresponding eigenvalues of C and if b; = mjc, then 
i= my + b;/b;, and the fitted surface becomes 2 = constant + 3 Debits. 
(Box and Wilson [4] and Box [1] give more details and interpretations.) 

The suggested choice for k; is m,/b;; which corresponds to using the separate 
stationarity equations found by equating to zero the partial derivatives of the 
true surface in the directions of the principal axes of the fitted surface. This set 
being dependent on the data, the only simple valid critical value is A = pF p¥;a 
(with which arbitrarily many more combinations can be used without increasing 
the error level of the intersection region). With k; = m,/b,,, the linear form 
defining £; and used for the component region R, is 


&, = kid, = mjd,/b;; = mic/b;, + m{Cr/bi: 


(7.3) ' 
= b,/b;; + mj,o. 


Thus, the transformation to the — coordinate system, useful in the computation 
of intersection region approximations, is here identical to the transformation to 


the principal coordinates {2x,;} useful in understanding the shape of the regression 
surface. 
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The intersection region R; and the approximation R’, R’ and R* will be illus- 
trated on the numerical example given by Box and Hunter [2] to illustrate Rg . 
For the fitted surface co = 77.95, s' = 1.07, v = 9, 


3.76 —5.74 3.84 
c = 
— 1.57 3.84 —5.28 
with covariance matrix given by equation (27) of [2]. The fitted surface has a 
: s 12 9 tT . . : . . 
maximum 2% = (.889, .349). The surface in the principal coordinates z is: 


3 


3 = 79.35 — 3(9.357 zi + 1.663 x3) and the transformation of coordinates is 
1 .7279 y: — .6857 yo — .4076 

(7.4) 

%2 = .6857 y: + .7279 ye — .8631. 


The transformation consists of a translation of the point 4 to the origin and a 
rotation through —43°17.2’. 

Using the second set of recommended {k;} and the critical value 
A = 2Fo9..0 = 8.52, the transformation (7.3) from the 2 to & coordinates is 
identical with the transformation (7.4) from y to x coordinates. The boundary 
equations in the & coordinates of the two component regions R; and R; are: 

g(—) = —.035 — 014 & — .054& + .929  — 045 & + .035 bite 


g(E) = —.433 + .048 ¢, — 468 & — 1.414 + 514 & + .709 Bt 


Thecorners of the intersection region R; are (— .88, 3.10), (.43, 1.31), (—.12,—.55), 
(.16, —.66) of which one lies in each quadrant, satisfying the hypothesis of Theo- 
rem 9.3, so that R, is contained in the quadrilateral R' formed as convex closure 
of these points. The parallelepiped approximation R° is given by the inequalities 
(—.19 S & S .20), (—.57 S & S 1.48) and the polyhedral approximation R’ 
is the intersection of the four tangent half spaces: 


Ty = {t: 067+ 3604+ .061 & = 0} 


T; = {t: 073 360 $+ 047 & = 0} 


{é: 599+ .355 & + 1.053 & = 0} 


{f: 1.558 — 1.097 & — 1.053 & = 0} 


The regions R,; , R2, R; , R', R’, R®, and Rs (the latter taken from [2], p. 198) 
are illustrated in Figure 2. (In two dimensions, the approximations to R; are 
unnecessary except for simple analytic description and are shown principally to 
illustrate the different approximations. ) 


8. Bounds on the error level of intersection procedures. A lower bound on the 
error level can be obtained that gives some indication of the closeness of the 
bound in Theorem 4.1. 

Fix A, let u; = x,/s with x; defined by equation (4.3) and let p;; = correlation 
(x, , xj). The joint distribution under H) of the {u;} is an r-variate generalization 
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the quadratic regression example of section seven 


of the maximum 


of the ¢ distribution (cf. Dunnett and Sobel [6]) with » degrees of freedom and 
correlation matrix [p,;| of the associated r-variate normal distribution. Denote 


the bivariate distribution integrals by 
d,(a, b, pij) = P(u; > a, uj; > b) 


f(a, b, pi;) = P(\ui| > a, |u;| > b) 
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K. Pearson [10, Table VIII and TX] gives d.z(a, b, p) for a selection of a, b, p 
and Dunnett and Sobel [6] give something simply related to d,(a, a, + 0.5) for 
a selection of a and v, and formulas for computing other values. The marginal 
f,(a, 0, p) is the double-tail probability of Student’s ¢ distribution and is inde- 
pendent of p. 
The error level of the joint test is 
P(U, > A) = P (max |u;| >WA. 


i 
l<isr 


Bounds on this probability are given in Theorem 8.1 
THEOREM 8.1. 


P (max > a) S rf,(a, 0, — 


l<icr 
P (max |u;| 2 a) 2 rf,(a,0, —) — > f(a, a, pis 
l< ‘sr 1< 


P (max = a) 2 rf(a, 0, —) 


r . 
_ (5) [t - *) fa a, 0) + P! fa, a, ne) | 
- Po Po 
; r 
in which po = Max Pp and Aa = = Pij / (3): 


Equality occurs in (1) tf r = 1 and in (2) and (3) if r = 2. 

Inequalities (1) and (2) are direct applications of Bonferroni’s inequalities 
‘ef. Feller [7]) to the events {|u| = a}. The inequality (3) follows on combining 
with inequality (2) the symmetry and convexity of f,(a, a, p) proved in the 
lemma below. For since 0 S |p;;| S po, 


f(a, a, pi3) = f,(a, a, | pss |) S (1 — ei!) (a, a,Q) + _ f.(a, a, po) 
Po Pr 


Y< (5) (2 a a) f(a, a, 0) + (2) ia a, po) |. 
- Po Po 


LemMMA. f, (a, a, p) is a symmetric, convex function of p. 

Use f,(a, a, p) = 2d,(a, a, p) + 2d,(a, a, —p). Write each d, as the double 
integral of the bivariate /-density, change to “elliptical polar coordinates” 
((6], p. 154), and integrate out the angle. Straightforward calculation of 4*f,/dp° 
shows convexity. 

Table 8.1 gives the upper bound (1) and lower bound (3) for a selection of 
values of a, r, v, po, p: chosen to give upper bound near .05 or .01. The upper 
bound would seem to be sufficiently accurate for most uses provided the corre- 
lations and r are not very large. 
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TABLE 8.1 
Bounds on P (max | u;| > a) from Theorem 8.1 


lsisr 
Lower Bound (3) 
Upper 
r y a — ey p= 5 po = 8 
pi = 0 - 

a = .2 a= 5S pi = .2 p 5 8 
2 00 2.2 .056 .055* — .052* 045* 
3 ee 2.4 .049 .048* 046 043 .038 031 
5 a 2.6 047 .046* 0438 .038 .038 .027 .016 
Ss oo aa .055 054* OAS .039 .039 016 <0 
10 oo 2.8 .051 050* .043 .033 .033 .007 <0 
2 ~ 2.8 .0102 .0102* .0098* - OOSO* 
3 oo 2.9 O112 0112* .0109 .0104 0092 OOSO 
5 20 3.1 0097 0096 * . 0092 OOS6 OOS4 .0065 0046 
8 x 3:2 .0110 .0109* .0102 .0092 . 0086 .0050 .0014 
10 oo 3.3 . 0097 .0096* 0089 0079 .0071 .0033 <0 
2 14 2.5 -051 .049* .046* 
5 14 3.0 .048 .044 .039 .033 
3 8 3.0 .051 .047 .O44 040 
o ; « 3.0 048 045" .042* 
5 | 7) 3.5 | .050 O41 .036 .029 


* Exact value. 
—Impossible pp , p, combination. 


9. Mathematical results for the geometry of section five. The notation of 
sections five and six will be used, and all results are for a fixed sample. For 
terminological convenience, all work will be done in terms of a Euclidean p-space 
E, with rectangular coordinates &. The affine transformation of the space does 
not affect the properties of interest. 

Let Var &; = o°(k;V)k;) be the variance of the linear form defining £; , trans- 
formed for the particular sample to a function of &. If S; is any set in EF, indexed 
by 7, let 


S; = SNE: &; > 0}, Sp == SiN{E: &; G. 0}, S$ =— SANE: &; —_ 0}. 


Let S* = E, — S, and S = closure of S. 

THEOREM 9.1. R?* and R}~ are conver. 

To prove R?* convex, it is sufficient to show that for any two points & and 
%, in R?* and any constant @ such that 0 < @ < 1, & = 6& + (1 — @)& isin 
Ri*. But 


&e RI* & {g(E) > 0 and & > O}. 


fo: > O is immediate. Expanding g;(&), using the Cauchy inequality for co- 
variance, g;(&;) > 0 for 7 = 1, 2 and é,%; > 0, 
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gi(fo) = Oli; — (As’/o”) Var fi] 

+ (1 — 6)"[t&, — (As*/o”) Var &:,] 

+ 20(1 — 6)[Et2, — (As?/o”) Cov (1; , &:)] 
> 6 gi(E&) + (1 — 8)'gil€s) 


+ 20(1 — 6)[Eto, — (As’/o")/ (Var &,)(Var &2,)] 


Set, 


> 20(1 — @)[Ertes — (Ei2.)"”] = 0. 
Coroutiary. R?* and R¥™ are convex. 
Turorem 9.2. If for some i and some sign (say +), J N FR = J, 
then R, NN R' c F; 
By hypothesis, J C F; C R*+. By the corollary to Theorem 9.1, R*¥* is 
convex and closed so that R’ C iy. 


R,NR'CR,N Rt CRN (RI* UFI) CRN FT C Fj. 


Coro.tiary. If any boundary F; is an ellipsoid, R; N R' C F;. 

The entire ellipsoid must be on one side of the plane &; = 0. 

THEOREM 9.3. If J contains exactly 2” points, with one point in each of the 2” 
open orthants formed by the p coordinate hyperplanes M,, --- ,M,, then RR; C R’. 

Three lemmas will be proved, from which the theorem follows immediately. 

By hypothesis, J contains 2” points, one in each open orthant defined by the 


p coordinate planes {£; = 0}. Denote the 2” points (‘‘corners’’) by 
ie = (Cur, °** , Cup); u = 0,---, 2? — 1} 
with the subscript u assigned so that if [wu , --- , u,] is the binary expansion of 


u, then e,; > 0 (e, « Mj*) if u; = 1 ande,; < O(e, ¢ M; ) if u; = 0. The 
point e, is in the diagonally opposite orthant from e_,_, . The “diagonal line” 
D(u, 2? — 1 — u) through e, and e»_;, can be parametrized as 


{E:& = dey + (1 — O)ex 1-1} 
and is the union of three disjoint parts: the ‘diagonal segment”’ D®°(u, 2” — 1 — u) 
with 0 S @ < 1, and two “outer diagonals” D(u) with 6 > 1 and D(2” — 1 — u) 


with @ < 0. D(u) is contained in the same open orthant as e, . Define 


+ 


Q; = convex hull of U D(u) 
Q; = convex hullof U  D(u) 
{u:uyg=0} 


Q= U (Qi UQ). 


t=1 


Lemma 1. Under the tonditions of Theorem 9.3, R; C Q*. 
For each value of u and 7, the diagonal D(u, 2” — 1 — u) intersects the bound- 
ary F; in the two points e, and @»_;,. The diagonal segment crosses each 
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coordinate hyperplane M; . Since the boundary equation of R; is quadratic, the 
‘ . ‘ “ae a om 
segment D°(u, 2” — 1 — wu) is in R; and the outer diagonal D, is in R;. Then 


li ocr” on vl Ace 


:uy=l {uiu;=0) 


By Theorem ¥.1, R** and R? are convex, so GcRitc R* and Gece ¢ 
R? for every 7. Then Q C UPR? = RT and R; C Q*. 

Lemma 2. Under the conditions of Theorem 9.3, R' contains a cube with the 
origin in the interior. 

By induction on the dimension p, the cube with faces |£;;| = @ = min,, ;\éu; 
will be shown to be in R’. Let &° be any point with \¢¢ = a for alli. If p = 1, 
then e; < —a < £ < a FS ey for the one coordinate and &° is in the convex 
closure of @) and e; . 

For arbitrary p, let e, be an arbitrary corner. Suppose that e., > 0. Let 
e. be the corner with sign e,,; = sign e.; forg < pande,, < 0. Thene,-, < 
—a < & Sa S ey» and there is a convex linear combination d, = 6.e, + 
(1 — 6,)e., with O S 6, S 1 such that d,, = £ . For the other coordinates, 
d,;| = min (\e,;|, \eu;|) 2 @ and sign d,; = sign e,;. There are 2””' points d, 
with u = (wm ,-°-++, Up-4, 1), satisfying the conditions of the theorem on the 
p — 1 dimensional hyperplane ~ = £} with restricted min,,;\d,;| = a. By the 
induction hypothesis, the point ®° lies in the convex closure of the {d,} but 
since each of these d,, is in the convex closure of R', so also is &’, completing 
the proof of the lemma. 

Lemma 3. Under the conditions of Theorem 9.3, R' D Q*. 

The lemma will be proved by defining an expansion of R’ to the entire space 
using only points of Q. 

By Lemma 2, R’ contains a cube P with corners (-ta, --- , +a). Denote the 
corners of P by {c,} with c, in the same orthant as e, . The cube P can be de- 
composed into disjoint open simplexes of dimension p and less, all vertices being 
corners of P and every face of every simplex in the collection. A simplex S of 


dimension q and with vertices Cuca) , --- , Curgs1) is defined as 
( q+i q+ 
S=<8:& = + Burj Cucs) ; Ou) = 1, all Oxi) > O? 
\ j=l j=l 
Taking 6, = O for all u 3% u(j) for any 7, the {@,, wu = 0,---, 2? — 1} are 
5 } YJ: 


the barycentric coordinates of the point — with respect to the simplicial de- 
composition. The barycentric coordinates are continuous functions over each 
closed simplex S and uniquely defined over P and so are continuous over P. 
(Cf. Lefschetz [9], p. 97.) 

One such decomposition of the cube consists of the p! p-dimensional simplexes 


S; = {&: -a < &, < +--+ < & <a} in which? = (, --- , 7p) is any permu- 
tation of the integers (1, --- , p), and of all faces of the {S,}. S; can be written 
as 


p+l 7 


{ p+l 
Si =(8:F = De Outi.i Culi,s)5 De Outi, = 1, Oui, > 0? 
. j=l 4 


j= 
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, 
, 


in Which ¢y,;) is that corner of P whose 7,th coordinate is —a for k < 7 and 


+a for k = j. The correspondence between the two representations of S, is 


given bv 


6 “2 — Fi. ads 4G =1,---,p +1): 

¢: —a, &,,, = +a). 
Any simplex lies entirely in a face, say {&; +a}, of P or else does not inter- 
sect a face. For if all vertices of S lie in {& = a} then so does S, and if one 


or more vertices do not lie in {£; = a} then the 7th coordinate of each point 
in S is less than +a. Let § be the collection of al! simplexes lying in the face 
F of P. § is a disjoint simplicial decomposition of F and the barycentric co- 
ordinates are continuous over F. Each simplex in § lies in some single face of 
the cube. Let S with corners Cua) ,--- , Cua) be an arbitrary member of §. 
Suppose that S lies in the face {f; = a} and hence in Mj*. 

Define a deformation of F as follows. For — ¢ S with — = > Buc jCucs) 3 
7. 6uc5) = 13; Ouc5) > 0; define 


on 

i 
M 
a 
2 
+ 

| 
M 
& 
es 
£ 
lA 
IIA 


= > 6 pn (l@ucy, + 1 — thee i. 


Let F(t) be the image of F under f,. The deformation consists in moving each 
corner of the cube to the corresponding corner of R; , then out the diagonals. 

For each ¢, f; is a continuous mapping of F into EF, since the barycentric co- 
ordinates are continuous over F. Further, the family of mappings is jointly 
continuous in ¢ and &. fo is the identity mapping on F and f; is homotopic to fo 
for all ¢. Write f,/F ~ fo/F; /F to indicate the domain of the mapping and ~ for 
the homotopy equivalence relation (cf. Lefschetz [9], p. 42). 

For 0 S$ ¢ S 1, F(t) C R' by Lemma 2 and the convexity of R’. For ¢ > 1, 
teary + (1 — t)ee-1~-.:;) lies on the outer diagonal D(u(j)). Since the corners of 
S all lie in M?*, so do the {ey}. Therefore f,(S) C Qj, the convex hull of 
all outer diagonals in M7*. Finally then, F(¢) C Q for all f > 1 and 

Pu(U Fit) C RuQ 
t>0 

The proof of the lemma will be completed if any point not in P can be shown 
to lie in F(t) for some t > 0. 

The distance from the origin to the image S(f) of any simplex S of the face 
of the cube is never less than a and increases to infinity. For, if S lies in M3", 
then ¢ucj. = @, Cun, 2 @ and e449, S —a. Then for any point — in S(é), 


& >aift S landé; = (2t — ljaift> 1. 
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If x is any point in FE, , denote by x, the mapping of E, — x onto the unit 
sphere centered at the origin, which maps & into the projection from the origin 
of € — x (vector subtraction). Denote by 7,/B the mapping 7, with domain 
restricted to the set B. The following topological theorem is needed :' 

THEOREM (Hurewicz and Wallman [8], Theorem VI-10). Let B be a closed 
bounded subset of E, . Two points x, y neither contained in B are separated by C, 
if and only tf the mapping x./B and x,/B are not homotopic: 4,/B ~ r,/B. 

Assume there is a point x not contained in P u (Uj F(t)). Since the distance 
(0, F(t)) — «, choose ¢, , such that dist. (0, F(t)) > dist. (0, x). Since x does 
not lie in the cube P, the points 0 and x are separated by the cube boundary F 
according to the Jordan separation theorem. Applying the topological theorem 
with B = F, m/F ~ x./F. Then for any ¢ 


wo/F(t) = woft/F ~ mofo/F = 1o/F ~ we/F = wefo/F ~ maf ,/F = 4./F(O, 


since (a) fo/F is the identity mapping, (b) fo/F ~ f./F by construction, (c) 
ofo/F ~ of./F by composition for any mapping ¢ with correct domain ([9], 
p. 42). Since homotopy is an equivalence relation, m/F(t) ~ 2,/F(t). Applying 
the topological theorem again with B = F(t), it follows that the points 0 and 
x are separated by F(t,). The line segment [0, x] being connected, must intersect 
F(t,) which is impossible since dist. (0, x) < dist. (0, F(t,)). Therefore 
Pu(UpiF(t)) = E,, completing the proof of the lemma and Theorem 9.3. 

THEOREM 9.4. Jf for each i, e; <0 <e; thnR’CR,. 

The line L; = N;.:M; intersects F{ at &; = e? > O and F; at & = e: < 0. 


Approximate F; and F; by their tangent hyperplanes at these points. If 7] and 
T; denote the closed half-spaces bounded by these tangent hyperplanes with 
halves chosen to contain the origin, approximate R; by Tj n Tz and R, by 
R? = N?_,(Tin T;). Since R7* is open and convex and contains the line L; for 
t; > e;, then R7* does not intersect the tangent hyperplane or the half-space 
Tt. Similarly, R7* n TT] = 0 so that Ti n T7 C Ri and R’ Cc R,. 
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EXACT PROBABILITIES AND ASYMPTOTIC RELATIONSHIPS 
FOR SOME STATISTICS FROM m-th ORDER MARKOV 
CHAINS! 


By Leo A. GoopMAN 
University of Chicago 


Summary. An exact formula is presented for the probability of a specified 
frequency count of m-tuples (m = 1) in a sequence X,, X2,---, Ny from a 
Markov chain of order m — 1 having a denumerable number a S <~ of states 
An exact expression is also obtained for the conditional probability of a specified 
m-tuple count, given the n-tuple count, when the chain is of order n 1 (n 
m). If a < «, then this conditional probability, when regarded as a statistic 
computed from the observed sequence, is shown to be asymptotically equivalent 
to the product of the probabilities (regarded as a statistic) associated with a cor- 
responding set of a” contingency tables with assigned marginals (each table 
having a” " row and a columns), where in each table the two attributes de- 
scribed by the table are independent. This fact leads to several simplified tests, 
related to standard tests of independence in contingency tables, for the null 
hypothesis H,_, that the Markov chain is of order n — 1 within the alternate 
hypothesis //,,_,. Analogous results are also obtained for circular sequences 


1. Introduction. For a circular sequence, Reed Dawson and I. J. Good [4 
have presented an exact expression for the conditional probability of a specified 
frequency count of m-tuples, given the n-tuple count, in the special case where 
the sequence is stationary and is of so-called zero Markovity; i.e., all (V — 1)! 
circular permutations of a sequence of N characters are equally likely. It is also 
proved in [4] that this expression, obtained under the assumption of zero Markov- 
ity, is also valid for ‘‘negligible’’ Markovity; i.e., for a stationary chain of order 
n — 1 or less (n < m). (The term “Markovity of order m” means that the 
Markov chain, from which a (linear) sequence of observations is obtained, is of 
order m; a definition of a ‘chain of order m’’ is given in [10] and in Section 3 
here. The circular sequence is defined in [4] as a linear stationary sequence with 
the ends joined.) For a (linear) sequence of N consecutive observations from a 
stationary chain of order n — 1, the conditional probability of a specified m- 
tuple count, given the n-tuple count, is presented in [4] as the value obtained 
by augmenting the linear sequence with a blank placed at the end of the sequence, 
circularizing the augmented sequence, and then applying the exact expression 
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for circular sequences to it. The treatment of linear sequences presented in the 
present paper is more direct, and leads to some different results from those given 
n [4]. An exact expression is given here for the probability of a specified m-tuple 
count in a sequence from a chain in the more general case where it need not be 
stationary and can be of order m — 1 (a case of nonnegligible Markovity). An 
exact formula is also obtained for the conditional probability of a specified m- 
tuple count f;,...;,, in a linear sequence, given the n-tuple counts 


Si, + Gee, z mH Pee: ee. i and 


t2 im—n 


when the chain need not be stationary and can be of order n — 1. Even in the 
case where the chain is stationary, the formula developed here refers to a dif- 
ferent question and is numerically different from that presented in [4]. (In [4], 
the conditional probability of a specified m-tuple count f;,...;:,, , given the n-tuple 
count f,,...;,, is presented for the stationary chain.) We shall see that, for a 
(linear) sequence of observations from a chain, it appears to be more relevant to 
compute the conditional probability when the n-tuple counts f;,...,,. and f. 
are given, rather than when the n-tuple count f;,...,, is given. 

For stationary circular sequences, it is proved in [4] that, when the chain is of 
zero order and hasa < © states, then the conditional probability of the observed 
m-tuple count f;,...;,, , given the 1-tuple count f; (when this probability isregarded 
as a statistic), is asymptotically equivalent (for N large and f,/N — k, > 0), to 
the probability of the cell entries fj,...,,, in a contingency table with assigned 
marginals f,,.. ;,,_, and f;, when the two attributes described by the table are 
independent. In the present paper, this result is generalized to show the asymp- 
totic equivalence, when the chain is of order n — 1, between the conditional 
probability of the observed m-tuple count, given the n-tuple count, and the 
product of the probabilities of a corresponding set of cell entries in a” con- 
tingency tables with assigned marginals. An analogous result is also obtained for 
linear sequences from a chain of order n — 1. (The result in [4] for stationary 
circular sequences of zero order cannot be applied directly to the conditional 
probability, presented in [4], for the circularized augmented linear sequence 
(since the 1-tuple count f,; for the augmented blank is 1 and fs/N — 0); the 
authors in [4] refer the reader to the present paper for results for linear sequences). 
These results lead to the fact that any asymptotic test of contingency for the 
a” independent contingency tables can be used to test the null hypothesis 
H,_; that the Markov chain is of order n — 1 within the alternate hypothesis 
H,,-, . The likelihood ratio test of H,_, within H,, given by P. G. Hoel [10], 
can be seen to be of the same form as the joint likelihood ratio test of contin- 
gency computed for the a” independent contingency tables. The test of H,_: 
within H,,_; , presented by I. J. Good [8] for the circularized sequence, can be 
seen to be of the same form as the joint likelihood ratio test for the a" contin- 


tm—n+1°** tm 
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gency tables related to the frequency count for the circularized (but not aug- 
mented) sequence. (Good also deals with the linear sequence in [8], but he agrees 
that his paper contains some slips. In applying results obtained for circular 
sequences to linear sequences, there is a real possibility of errors. (See Corrigenda 
to [8] and Leo A. Goodman [{9].)) For the linear sequence, the likelihood ratio 
test of H,_; within H,,_, , and the x’-test of the form used in contingency tables 
(which is equivalent to the likelihood ratio test), were presented by T. W. An- 
derson and Leo A. Goodman [1]; but these authors were concerned mainly, in 
{1], with a large number v of sequences of N consecutive observations from a 
chain with a finite number of states, where v — © and N was fixed and could, 
in fact, be small. There was one brief section in [1] dealing with v = 1 and N — 
cc, and it was based on a long sequence (asymptotic) result, due to M. 8S. Bart- 
lett [2], concerning the 2-tuple count. The results developed in the present paper 
are based directly on the exact formula for the distribution of the m-tuple count 
when v = | and the chain has denumerably many states. 

The approach used here is related to, but different from, earlier work ({1], 
|2], [6], [13]), where the observed transition proportions were shown to have 
some properties similar to those of the observed proportions from a set of inde- 
pendent multinomial distributions. 

The exact formula developed here for the distribution of the m-tuple count 
from a chain of order m — 1 is a generalization of a result, due to P. Whittle 
[13], for the special case of m = 2. A different, and perhaps simpler, proof of the 
result in [13] will be presented, and it will be related to the work in [4]. The gen- 
eralization in the present paper is based directly on this result. 

When indicating how many degrees of freedom certain statistics (which were 
asymptotically x’) had, most of the articles mentioned in this section assumed 
(either explicitly or implicitly) that all the transition probabilities in the Mar- 
kov chain were positive; for the sake of simplicity, we shall do likewise here when 
indicating the size of certain contingency tables (and thus how many degrees of 
freedom the x’ statistics corresponding to these tables have). If some of these 
probabilities are zero, then the methods developed in the present paper can be 
modified in a straightforward manner to obtain analogous results (see [2}). 

2. The 2-tuple and 1-tuple counts. Suppose that a sequence X,, X2,-°-- , Xw 
is obtained from a first order Markov chain with constant transition probability 
matrix P = [p;;]; i.e., the probability is p ; that X, = j, given that X,, = 7. 
For the sake of simplicity, we first assume that the chain has a finite number 
a < © of states. We write f;; for the frequency in the sequence of the 2-tuple 
(i, 9) (t, 9 = 1, 2, --+ , a); we also write Dit = f;. and Di lis = f.,. If the 


chain begins in state r and ends in state s (X,; = rand Xy = s), then 
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where 6,;; equals 1 or 0 according as i and j are equal or unequal. The following 
result, based on the work in [13], will be used here. Let 7,(f;;) be the (sr)th co- 
factor of the a X a matrix [6;; — fi;/f:.] = M if the fi; satisfy (1) and (2), and 
let it be zero otherwise. (It can be seen that 7,(f;;) does not depend on r and is 
nonnegative.) Then the probability [], (f;;, s) that the 2-tuple count will be 
fist, 7 = 1,2, +++ , a) and that the sequence ends in s, given that it begins with 
r, is 


3) , Tifa) elle! TE TT pi 

IDs 
(Actually, it is stated in [13] that (3) is the probability [,, (f;;) that the 2-tuple 
count is f;;(¢,7 = 1, 2, --- , a), given that the sequence begins with r and ends 
with s; this is not quite correct, but can easily be corrected, as has been done 
here.) 

Formula (3) will hold only if N = a,andf;. > Oandf., > O(¢ = 1,2, --- ,a). 
However, for N < a or some f;. or f.; equal to 0, (3) still holds if calculated on 
the basis of a process including only those states that have been observed (see 
[13]}). 

A proof of (3), different from that given in [13], will now be presented, since 
it may increase the understanding of this formula and also since a somewhat 
different procedure for computing (3) is obtained. This proof uses an approach 
similar to that applied in [4] to circular sequences with negligible Markovity. 
It is based on the following combinatorial theorem, called the BEST theorem 
in [4] (due to N.G. de Bruijn, T. van Aardenne-Ehrenfest, C.A.B. Smith and 
W.T. Tutte [5]): Given any a X a matrix M = [m,,] of nonnegative integers, 
there corresponds an oriented linear graph, with vertices 1, 2, ---, a, such that 
the number of oriented paths (edges) leading from vertex i to vertex j equals 
m,; . The matrix, unique to within the same rearrangement of rows as of columns, 
is called the incidence matrix of the corresponding oriented linear graph. The 
graph is defined as simple if m; = i mi; = >i m;;. A circuit in such a graph 
is defined as a unicursal path passing exactly once through each edge (in the 
right direction). Let M’ = [m;;] be the a’ X a’ matrix formed from M by delet- 
ing every row and column consisting wholly of zeros. Then > m:; = 4 m., = 
m, > 0, for i = 1, 2,---, a’. Let M* = [m? |], where m?; = mid;; — mi; 
Since M* is a square matrix with each row and column summing to zero, the 
cofactors of its elements are all equal; let || M* || be the common value of these 
cofactors. Then the BEST theorem asserts that the number C(M) of distinct 
circuits, when all the edges are distinguishable, in a simple oriented linear graph 
with incidence matrix M is 


i 
C(M) = || M*||-[] (m—-—D!. 
i=l 
Let N,.(M) be the number of circuits that begin at vertex r and end at vertex 
8; i.e., the number of paths that pass once through each edge, except for one of 
the edges leading from vertex s to vertex r. Then, when all the m,; oriented edges 


480 ~=LEO A. GOODMAN 


from vertex 7 to j are distinguishable, we have that N,.(M) = C(M)m,,. If 
these edges are not distinguishable, then the number of circuits that begin at 
vertex r and end at vertex s is U,,(M) = N,.(M)/|]i; mi;! = C(M) /L Las fis!, 
where f.,, = ms — 1 and fi; = mj; for (i, 7) ¥ (s, r); U,(M) is the number of 
paths that pass directly from vertex 7 to j in total f;; times, and that begin at 
vertex r and end at s. If 5°; >> fi; = N — 1, the probability of observing any 
given path (a sequence of vertices or states) that begins at r and ends at s ina 
sequence X,, X:,--- Xy from a chain with transition probability matrix 
P = [p,;], given that X, = 1, is IL; pi ‘, Since the number of such paths is 
U,,(M), the probability of observing one of these paths is 


Il-fu.8) = Un (MW) TI, fi! = Feeme pi 


eat Mon 


where 7,,(f,;) is the cofactor of an element in the sth row of the matrix M** = 
[mi], where m7," = mi, m; . Q.E.D. 

A similar proof was also independently obtained by Dawson and Good in an 
unpublished note. This proof indicates that (3) holds even when a is infinite, 
since it depends essentially on the a’ X a’ matrix M*, where a’ is finite when N 
is finite, rather than on the a X a matrix M. Thus, the exact formula, presented 
in [13] for the chain with a finite number of states, also holds for the chain with 
denumerably many states. (See [6] for some asymptotic distribution theory for 
first order chains with denumerably many states.”) This proof also indicates 
that [], (fi; , s) can be computed from the expression (C(M) /II i; fi; | IL; pi, 
where m;; = fi; for (7, 7) ¥ (s, r) and m,, = *., + 1, which may sometimes be 
simpler to apply directly than (3). 

If r is given, and the f,; satisfy (2) and also (1) for some s, then that s is unique. 
Thus, s can be determined by (1) as a function of the f;; when r is given. Since 
the probability II. (f.;) of the 2-tuple count fj; , given that X, = r, is obtained 
by 


(4) II. ¢.) = IL (fix, 2); 


and since | [, (f:;, x) is O for all values of x ¥ s, we have that II. (f.;) is equal 
to (3), if the fi; satisfy (2) and also (1) for some s. 

The probability ]], (fi; | s) of the 2-tuple count fi;(7, 7 = 1, 2,---), given 
that the sequence begins with r and ends with s (which is the verbal (not quite 
correct) description that was given in [13] for (3)), can actually be obtained by 


2T am indebted to K. L. Chung for bringing [6] to my attention 
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dividing [[, (fi; , s) by the probability pl’) that the sequence ends with s: 
p\*-") is the (rs)th element of the transition probability matrix P** = he 4. 

If X, is a random variable with a probability p, of being in the rth state, then 
the probability [] (f,; , s, r) that the 2-tuple count is f;; , that Xy = sand X; = 
r, is simply JI, (fi; , s)p, . _ he general approach given here can also be used to 
obtain exact expressions for the probability [| (f.;, -, r) that the 2-tuple count 
is f,; and that X, = r, for the probability [] (f,;, s) that the 2-tuple count is 
f;, and that Xy = s, for the probability [J (f,;) that the 2-tuple count is fi; , 
and also for various conditional probabilities. 

The distribution of the f.; = }°;f,; will now be studied, when the chain is of 
zero order; 1.e., pij = p.; fort,j7 = 1,2,---.The/f.; are the 1-tuple frequencies 
in the sequence X, , X3,--- , Xw ;i.e., f.; is the number of observations in state 
j among this sequence. The probability II. (f.;, 8) that the 1-tuple count in 
this sequence is f.; and that X. = s, can be derived using the standard multi- 
nomial formula, and we obtain 


6.8) = (p25) Go To 


Therefore, for a zero order chain, the conditional probability [], (f.; | f.,, s 
of the 2-tuple count f;; , given the f.; and s and r, can be obtained by dividing 
(3) by (5), when the f,; satisfy (1), (2), and also >>; fi; = f.;. (We can assume, 
without loss of generality, that f.; and s are such that II. (f.;, s) > 0.) Thus 


HT Guits.9= [7 0 / (5% Whraie aa 


the second factor is the probability P(fi; | f.; , f:.) of the cell entries f;; in an or” 
dinary contingency table with assigned marginals f.; and f;., where the two 
attributes described by the table are independent (see, e.g., [12], p. 278). 

Since the f;. can be determined by (1) from the f.; , r, and s, we have that (6) 
is also the conditional probability rt. (fis | f-5, fi.) of the fi;, given the f.; , fi. , 
and r; and (5) is the conditional probability II. (f.;, fi.) of the f.; and ; 
given r. Furthermore, since the 1-tuple count f; in the sequence X, , X,, --- , Xx 
can be determined from f.; and r by the relation f; = f.; + 6; , (6) is also the 
conditional probability [, (f;; | f; , s) of the 2-tuple count f;; , given the 1-tuple 
count f;, and r and s. 

From (6), we obtain 


tte) [ ( gp@) = Te Ua ifs .fe)/ PU 


We shall now prove that the statistic (7) converges in probability to unity (thus 
II. (fi | f.5, fi.) and P(fi; | f.;, fi.) are asymptotically equivalent), if the chain 
is of zero order and pi; = p.; > 0. In this case, f../(N — 1) converges in prob- 
ability to p.,, and it will be necessary to prove only that T,(f;;) converges in 
probability to p.. . 
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For the sake of simplicity, assume that the chain has a finite number of states 
and that f,. > 0 for i = 1, 2,---, a. The a X a matrix M will converge in 
probability to M = [6;; — p.,|, and T,(f;;), which is the (sr)th cofactor of M, 
will therefore converge in probability to the (sr)th cofactor of M. Since the sum 
of the entries in each row of M is zero, the cofactors in row s are all equal to the 
(ss)th cofactor | M,|, the determinant of the (a — 1) X (a — 1) matrix M, 
obtained by deleting the sth row and column in M. By some elementary trans- 
formations of M, , or by the identities between the cofactors and the elements 
of a matrix (see, e.g., p. 109 in [3]) and the relationship between the principal 
minors and the characteristic equation (see, e.g., p. 19 in [11]), we see that 
| M,| = p.. Hence, 7.(f:;) converges in probability to p.,. Q.E.D. 

This result, concerning the asymptotic equivalence (under the assumption 
Hy of zero Markovity) of TI. (fis \f.5, f%) and P(fi; | f.; , fi), implies that the 
null hypothesis H_ can be tested, within H, , by any asymptotic test of contin- 
gency in the contingency table with cell entries f;; and with assigned marginals 
f.; and f;. . This implication follows from an application of the following lemma 
proved in [4]: If (a) an experiment (with parameter N) has, for each value of N 
(positive integers tending to infinity) a finite set F* = {F2} of possible out- 
comes, (b) Py (or simply P) and Py (or simply P’) are two probability measures 
over F* such that P’(F})/P(F?) converges in the probability P to unity, where 
P'(F})/P(F?) is regarded as a statistic whose distribution is determined by P, 
and (c) S(F?) is a statistic whose cumulative distribution function @y converges, 
as N becomes infinite, to a limiting distribution ® under P, then the distribution 
function @y of S(F]) under P’ also converges to the same limiting distribution 
®. This lemma can be applied, in order to obtain the desired implication, by 
taking P(F?) = [P(fa\f.5,f.) IL 5, fi], and P’(F?) = II. (f:;, s). Since 
Hy is assumed, P’(F7)/P(F?) = IL-(h, lf.3,fi)/P Os | f-5, fi.) will converge in 
probability to unity. Since any asymptotic test of contingency in the contin- 
gency table with cell entries f;; and with assigned marginals f.; and f;. will have 
the same asymptotic distribution under P(fj;|f.;, fi.) (ie., in the standard 
case) as under P(F}) (since the f.;/N and f;./N converge in probability to p.; 
and p.; respectively), it follows from the lemma that the same standard asymp- 
totie distribution will also hold under P’(F?) (i.e., when H, is true). 

Since the sequence obtained from the chain is finite, it will not provide esti- 
mates of p,; , for all 7, 7, if the chain has a denumerable infinity of states (see [6]). 
Thus, when a = ©&, select (independently of the data) a finite subset of, say, b 
states, and consider all states that are not included in this subset as belonging to 
a single state; i.e., reduce the original number of states tob + 1 = a’ in the modi- 
fied sequence. The tests of Ho , suggested in this section for the case where a < 
«, can be applied to the modified sequence consisting of a’ states, and the 
results presented will hold also for this case, A rejection of Ho for the modified 
sequence would imply a rejection of this hypothesis for the original chain con- 
sisting of denumerably many states. This general method is applied to some dif- 
ferent hypotheses relating to Markov chains on p. 293 in [6]. 
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3. The n + 1-tuple and the n-tuple counts. Suppose that a sequence X,, 
X,,-++ ,Xy is obtained from a Markov chain of order n(n 2 1), where the prob- 
ability is pi; that X, = j, given that (Xin, Xe-nst, ++, Xen) = (hth, -'-, 
tn) = t. For the sake of simplicity, we assume that there are a states in this 
chain; i.e., Y, can take its value from among a possible values of 7. We define a 
new sequence of random vectors Z; = (X,, Nz, --:,X,), Z2. = (X2,N:, , 
Xnsi),°°° , 2wengi = (Xn-n41, Xn—-ni2,°°*°* , Aw) Where each vector can take 
its value from among the a” possible values of i. The probability pi: that Z, f, 
given that Z,;; = t, is equal to - ;, for 1* = 7’, where i* = (iz, #3, °-+ , tn), 
F AG SH, *** 5 Ju-t), (1; if = (j', jn), and pi: is zero otherwise. The 
sequence Z,, Z2,°°*, ‘Ds ati, 18 ea a first order chain with constant transi- 
tion probability matrix P,, = [pit]. This chain has a” states; P, is an a” X a” 
matrix (see [2}). 

The frequency fi: of the 2-tuple (i, f) in the sequence of (V — n + 1) ob 

served Z’s gives the (n + 1)-tuple frequency f;,;, in the sequence of N observed 
X’s forall (i, f) where f i*,j,), and f;, will be zero otherwise. In other words, 
the frequency fi:,,, in the sequence of X’s of the (n + 1)-tuple (1, #2, --- 
inz1) is the number of values of ¢ for which (X, , Xiy41, Nise, +--+, Xega) 
(t, Zn41), 7.e. the number f,: of values of {for which Z iy. Miia *** tand 
Z 141 f, for f (i*, ingi). Since f,; is the 2-tuplecount ina sequence from a first 
order chain, (3) can be applied to obtain the probability [].(fi:, 8) that the 
(n + 1)-tuple count in the observed sequence of X’s will be fi;, and that 
in—a+i &, given that Zi - r. We obtain 


II. (fir, 8) = Tol 


IL fi! 
is: ILf::! ALT vt 


sl ett ™ 
= Tl Gu = WU pp yes TT vb 


where T (fis) is the (sr)th cofactor of the a” x a” matrix [5 ‘— fir /fi J = M,. 
This result could also be obtained by applying the BEST theorem to the ver- 
tices 1. 

The probability IL: II.(f:,) of the (nm + 1)-tuple count fi(j = 1, 
2,---,a,and i = 1, 2,---, a"), given that Z; = r, can be obtained from (9), 
by applying (1) and (2) to the sequence of Z’s. Also, the probability [].(fir| 8) = 

[. (fi; |8) of the (n + 1)-tuple count fi; , given that Z, = r and Zy_n»41 = &, 
can be determined with the aid of (9) and the (NV — n)th power of P, 

The distribution of the f.j will now be studied, when the sequence of X’s 
is from a chain of order n — 1(n > 1). If the chain is of order n — 1 (within 
the hypothesis H,), then pi; = p.«; for i;, j = 1, 2,---, @ and for all a” 
values of 2*. 

We define a new sequence of random vectors W, = (X2., X3,°--, Xn), 
We = (Xs, Xa,°°*, Kaas), ***, Wu-nss = (Xn-na2; Xwon4d, °° ’ Xy), 
where each vector can take its value from among the a™" possible vectors 7* 
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The probability py; that W, = j*, given that W,. = 7*, is equal to p., 
for I = j, where I = (t3, %,---, t), J = (Je, js, -*** » Jn-t), 1* = (te, I), 

= (j, jn), and py; is zero otherwise. We have that pywj = pit for j, = 2 
and for all values of 7;, where i = (7, , 7*) and j = (j,, j*). The sequence W7,, , 
We,-+:, Ww-n41 is from a first order chain with transition probability matrix 
Poa * elles .|. This chain has a"™ states. 

The n-tuple count f.1 = g:in the sequence X2 , X;, --- , Xy can be determined 
by the 2-tuple count gi+;« in the sequence of W’s. Also, gwje = g: for f= (jn, 
j*) = (*, j,). For the n-tuple count hy in the sequence X2,, X3,-°-- , Xw-1, 
we have that ht = gt — 56. Since h: can be determined by the 2-tuple count 
hwj+ in the sequence W,, W:,--- , Wy_, from a first order chain, (3) can be 
applied to obtain the probability I] + (hej , s’) that the n-tuple count in the se- 
quence X_, X3, °-: , Xy_; will be h; and that Wy_, = s’, given that W, r*, 
The probability II.+ (g:, 8) that the n-tuple count in the sequence X;, X;, --- , 
Xy will be g: and that Zy_n4: = 8, given that W, = r*, is simply [],- (hw;- 
8’ )p..«'e,. Thus, 


(10) IT.- Gg, 8) = | Te Chine) (2 +) - ITI p*: |, 


where oS he + Ore. 
The (n + 1)-tuple count in the sequence X,, X2,--- , Xw can be denoted 
by fis; or fas; : where 7* = J ‘ Also, Do hiss = fi; = Fie = gj = gir; and 


Dir =o: = Dil: _ Did_iive; an Saw ; 


where i = (i, 7*) and f = (j’,j). Thus the probability []. (f..; , 8) that 


Doifivs = fice 


and that Zy_,.1 = $ given that Z, = rf, is 


(11) [fuera = [7 (hy (F =) VIL | pfs Mote | 


where s = (s’, s). Therefore, if the chain is of order n — 1, the conditional 
probability of the (n + 1)-tuple count fi; = fi; , given the f.;; and 8 and r, 
is obtained by dividing (9) by (11); ie., Il. (fi; | f.; , 8) is equal to 


vy [rasarranon f2) 


(We can assume, without loss of generality, that the f.; and $ are such that 
II. (f-;, 8) > 0.) The second factor in (12) is 


lie Pie fis; fi; , Siw.) — P(f is; fe; » Sse), 


the product of the probabilities of the cell entries f;*; in an @ X a contingency 
table (for a given (nm — 1)-tuple 7*), with assigned marginals f.; and fis. , 
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where in each table the two attributes described by the table are independent; 
i.e., the joint probability of the cell entries fj; for all a” independent con- 
tingency tables. 

It can be seen that (12) is also the conditional probability 


TDs (foies | fuses » Soe 


of the fi; , given the f.;, fiw. , and r; it is also the conditional probability 
Il. fies \ fs, 8) of the (n + 1)-tuple count fi; , given the n-tuple count 
fw; , and rand 8. 

From (12), we have that 


(13 | 76H.) / Poon) (=)| — Te oves | Sovei Sue.) 


To P( fis; f i*j » Sai ) 








We shall now prove that the statistic (13) converges in probability to unity 
(thus 7.(fi:) and Ty (his;*)(f.../f..) are asymptotically equivalent, and 


II. (fis f oe Fix 


and P (fi; | f.«;, fae.) are also asymptotically equivalent), if the chain is of 
order n — 1. 
If the chain is of order n (a chain of order n — 1 is also of order n 


), we saw 
earlier that a first order chain could be defined with transition probability ma- 
trix P,, = [pil], and we shall assume that the asymptotic occupation proba- 


bilities p: for this first order chain are all positive; i.e., p: > 0, where p: is such 
that >>; pipit = pr for all £. This will be the case if the chain described by P, 
is irreducible, (positive) recurrent and aperiodic (see, e.g., [6] and [7]). (If p: = 
0 for some f, the methods developed in the present paper can be modified in a 
straightforward manner to obtain analogous results (see [2]).) If the observed 
sequence is from a chain of order n — 1, then the occupation probability p, = 
pyp.7; , Wheref = (7,7), and p; is the asymptotic occupation probability for the 
first order chain with transition probability matrix P,-, = [peje]. (Lemma 1 in 
[6] gives a somewhat. different, but related, result for chains with denumerably 
many states.) Since II: (f.ie, , 8) > O, then p..., > O where 8 = (s’, s), and 
fee/f-«. will converge in probability to p.,, . Thus, it will be necessary to prove 
only that T.(fi:)/T.(hj*) also converges in probability to p.., . 

We have that 7,(fi:) is the (8r)-th cofactor of the matrix M, , and T,-(hi;) 
is the (s’r*)th cofactor of the matrix [5;+j;¢ — Aye j*/he.] = M,_1 . These matrices 
will converge in probability to the a* X a” matrix M, = [bir — pir] and the 
a”™* & a™"* matrix M,_, = [8j — p,e;] respectively, and 7.(fi1) and 


Te (his *) 


will converge in probability to the (8r)th cofactor and the (s’r*)th cofactor of 
M, and M,_,: respectively. Since in each matrix the sum of the entries in each 
row is zero, all the cofactors in row & of M, are all equal to the (88)th cofactor 

M,..| of M, , and the cofactors in row s’ of M,_; are all equal to the (s’s’)th 
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cofactor | M,-1,.° | of M,-.:. Therefore, it will be necessary to prove only that 
| Mae | = | Maser | ps. 

We have that M, = I, — P,, where J, is the a” X a” identity matrix and 
P,, is the transition probability matrix for the first order chain for the Z’s. We 


order the states i = (i, 7*) of this chain recursively as follows: (1, 7*) = 11%, 
(2, i*) = 2i*, --- , (a, i*) = ai*, for 7* = 1,2, --- , a”, obtaining a numbering 
of ifrom 1 to a”. Since pi: = p..*; for j’ = 7*, and it is zero otherwise, we have 
that 
bs Pe «++ Pa 
2 > is sata 2 
(14) Po=i*’.” sh sna 


where P.; is the a** X a” matrix [spije] with jpeje = pose for ic = j, and 
jpixj+ = 0 for 7, ¥ J, for all j* and 7. Hence P, consists of a block columns 


dg Begs *** oe forj = 1,2,---,a; 
it also consists of a block rows [P.,, P.2,-+-, P.e]. We have that | M,..| is 
the determinant of the (a” — 1) X (a” — 1) matrix M,,, obtained by deleting 
the row and column relating to state 3 = (s,, s*), the s;s*th state in M, ; ie., 
column s* within the sth block column [P.,, , P.., , +--+ , P.s,|/ and also row s* 
within the sth block row [P.,, P.2,---, P.«], and the corresponding column 


and row in the identity matrix. Let P”,, be the a’ x a” matrix obtained 
by replacing the s*th column in P.,, by a column of zeros. By some elementary 
transformations of the matrix M,.., we find that | M,,.! is equal to the de- 
terminant | M | of the a*’ x a™* matrix M = Ina — [doje Py + PY). 
Thus, it is necessary to prove only that | | = | Mas. | pes. It can be seen 
that the only distinction between MW and M,_, is that the term p.., appearing 
in row s’ and column s* of M,_, is replaced by a zero in M. (If s* = s’, then 
the term 1 — p.,-, is replaced by 1.) Thus, each cofactor in the s’th row of M 
is equal to | M,_1,. | . Since the sum of the entries in row s’ of M is 


:= Dien * Ps’ j* P-s's 5 


we have that p...| Mac. | = | M |. Q.E.D. 

This result, concerning the asymptotic equivalence, under the assumption 
Har, of []s (fis | fej, fae) and P(fi;|f;, fue), implies that the null 
hypothesis H,_; can be tested, within H, , by any asymptotic test of contin- 
gency in the a*” ordinary a X a contingency tables with cell entries f;,«; and 
with assigned marginals f. +; and fi. . 


4. The m-tuple and the n-tuple counts (m > n). Let « be the (m — n)-tuple 
(i: , 12, *** » tm—n) and I the (n — 1)-tuple (imn41, tm—n42, °** » tm-1). Denote 
the m-tuple count in the sequence X, , X2, --- , Xw by fay. Then >>; fury = far 
is the (m — 1)-tuple count for the sequence X,, X2, ---, Xy-1, and - fiuj = 


-?) 
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f.13 is the n-tuple count for the sequence Xmen, Xmni2,°'* , Xn. Let 


I]: (fins | fas, 8) 
be the probability that the m-tuple count will be f.,;, given that }>. fay = fa; 
and (X, . Xo a *e* 4 zs 1) = fT and (X x mi2y, . ee 6 eS X w) = %. If m 
n + 1, the results in Section 3 give the formula for this probability. If m 


n + 2, then [J (fa; | 15, 8) is equal to 


TT. (fers | Sevors » OU De Geinns | fry, 8): 


the first factor is the probability that the m-tuple count will be f.:; , given that 
Ds, Sas _ f tej and X, . Xe, Ce He ua) = fT and (5 wimas ee te Xw) = 9; 
the second factor is the probability that the (m — 1)-tuple count in the se- 
quence X,, X;,--- , Xw will be f.i1;, given that >> ;, f.i13 = f.1; and 

(X, , X2 a ae X m1) = 


and (Xw_emse,°*:, Xn) = 8&8. If the chain is of order n, the results in Section 
3 indicate that the first factor is asymptotically equivalent to 


(15) Il (1 fa. I Signi /II II firs Unig. 1}; 


if the chain is of order n — 1, the second factor is asymptotically equivalent to 


(16 IT (UD fon WD fs IL TD fan fa 


since it can be shown from the derivation of (12) that ]], (f.is15 | f-13, 8) is 


asymptotically equivalent to II. (f.i215 | f-13, 8*), where @ = (s,, s*) and 


r = (r,, r*). Thus, for chains of order n — 1, II. (firs |\f-1;3, 8) form = n+ 2 
is asymptotically equivalent to the product of (15) and (16); viz. 


(17 IT IT fi WT fas /IT fa fax 1s" 


2 


In the general case where m > n, by repeated application of the preceding 
results for m = n + 1 and n + 2, we find that, for chains of order n — 1, 


I] Gas | fas, 8) 
is asymptotically equivalent to (17), the product of the probabilities 
Pi(fays I Jats ’ far 


of the cell entries f,;; in a contingency table (for a given (n — 1)-tuple I) con- 
sisting of a columns (j = 1, 2,--- , a) and a” ” rows (the a” " values of 1), 
with assigned marginals f.;; and f,;. , when in each table the attributes described 
by the table (there are a" tables) are independent. This result implies that 
the null hypothesis H,_,; can be tested, within the hypothesis H,_,, by any 
° : . n—1l ° m—n . 
asymptotic test of contingency in the a” ordinary a X a contingency 
table with cell entries f,;; and with assigned marginals f.,; and f.;. . These tests 
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. n—1/ h mn 
will have a” “(a — 1)(a™" — 1) = (a”™ 


— a")(a — 1)/a degrees of freedom 
(see [1] and [8}). 


5. The circular counts. It was shown in [4] that, for zero order chains, the 


probability II (Fis---1m | Fuy--i,) Of a specified m-tuple count fj,...;,, in a circular 
sequence, given the n-tuple count ~— “a(n < m), is 


(18) C(ige-tel) LT Fig--tn V al) I Si,-- 


C(fi,--iml) ILS: —-1)!]]fi,...! ifn =1, 


where F* = [f;,... il is the incidence matrix of the graph (see [4]), and C(M) 
defined in Section 2 here; (18) is valid for chains of order n — 1 or less. In < 
special case n = 1 it was proved in [4] that the statistic (18) is asymptotically 


equivalent to Oe Sees tT Ff. 1/N! LT Fiss---tn !, the probability 
PG s---te DR sda ta) 


of the cell entries f;,...,,, in a contingency table with assigned marginals f,,....,._, 
and f,;. This result for the special case n = 1 will now be generalized to the 
case n = 1. 

Let us first consider the case where m = n + 1. We can write 


(19) FR = D(fi, ... i.) au - i], 
Ji. 


where i = (i:, i2,-°:, tn), fit is defined for circular sequences in the same 
way as fir was defined in Section 3, and D(fj,...,,) is the a" X a” diagonal ma- 
trix where the entry in row t is fi. We shall assume that no row or column 
consists wholly of zeros. The common value | F%,;| of the cofactors of F% can 
be obtained by determining the (ii)th cofactor of D(fi), which is ]T +z: f:, and 
also the (ii)th cofactor of [6:1 — fit/fi.], which converges in probability to the 
(ii)th cofactor | M,,; | of M, . From the results in Section 3, for the case where 
the chain is of order n — 1, we see that | M,.i| = | Ma. | pws, where i = 
(i’, 7). Thus | F% ,, | is asymptotically equivalent to TT uct ft | Mare |fos/fe . 
Also, the (7’i’)th cofactor | F%_1,+ | of F%_, is asymptotically equivalent to 
[Li fie | Mat. |. Hence, | F%,:|/| Pas. | is asymptotically equivalent to 


[1] :f+/] 1 fi, and C(F*)/C(F*_,) is asymptotically equivalent to 
I :fe/I1-Fi«! . 


Therefore, if the chain is of order n — 1, (18) for m = n + 1 is asymptotically 
equivalent to 


: [ede Te fe! Mate TLSes 
(20) ea” Se ae 


the product of the probabilities P j«(f;, ;*; | fj,5, fj;) of the observed cell en- 
tries f;,;+; in an ordinary a X a contingency table (for a given (n — 1)-tuple 
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)*), with assigned _— J i,j and fj; (we have that >>; fi, ;«; Sa = 
> 7 iis it = J.i,j = Ji,j«), where in each table the two attributes described by 
the 


‘ table are inde te nt. 


This result, concerning asymptotic equivalence in the special case m = n + 1, 


can be applied repeatedly to obtain a general result for the case m > n, as 
was done in Section 4. Thus, if the chain is of order n — 1, the statistic 
II (fi, 


is asymptotically equivalent to 


Wises... !TL Ufa: Ts 
Te > So 


where « @ (2; , 42, *** , feun), 1 = (anni, tm-ansd, *** > tm), § = fn. TONDO, 
any asymptotic test of contingency in the a*” ordinary a X a™ " contingency 
tables (a table for each (n — 1)-tuple I) with cell entries fi,...;,., and with as- 
signed marginals f,,...;,,_, and fi,_,,,---, (ie., fi,...«,), cam be used to test the 
null hypothesis H,_, within H,,_,. The degrees of freedom are as in Section 4. 

The reader will note that, in this and the preceding sections, each m-tuple 
was “split” into an (m — n)-tuple «, an (n — 1)-tuple I, and a 1-tuple 7; thus 
obtaining a” contingency tables, each a X a™”. It is possible to split each 
m-tuple into an (m — n — r)-tuple, an (nm — 1)-tuple, and a (1 + r)-tuple 
(0 S$ r S$ m — n — 1); thus obtaining a” contingency tables, each a’*” X 
a™". For r = m — n — 1, the m-tuple is split into a 1-tuple, an (n — 1)- 
tuple, and a (m — n)-tuple; the a” contingency tables obtained will differ in 
general from the a*~ tables obtained for r = 0. However, for circular sequences, 
the product of the likelihood ratios (for testing independence in each table) for 
the a” tables obtained when r = m — n — 1 will be equal to the correspond- 
ing product for the tables obtained when r = 0. For linear sequences, the corre- 
sponding products when r = m — n — 1 and r = O will be asymptotically 
equivalent, under H,_, . Both these products are asymptotically equivalent to 
the likelihood ratio for testing H,,_, within H,,-, . Similar remarks could be 
made about other statistics (e.g., the x’ statistic) used to test independence in 
each table. If the a”” separate t tables were of interest, the choice between 

0 or m — n — 1 would depend on the alternate hypotheses within H,,_, 
that were in mind. 

For 0 S r S m — n — 1, it can be shown that the asymptotic mean value 
of the product of the likelihood ratios (when normed in the usual way) is equal 
to a” '(a"-"-" — 1)(a'*" — 1), under H,_,. This statistic is not equivalent to 
the likelihood ratio for testing H,_, within H,_, , unless r = 0 or m — n — 1. 
Also, the asymptotic distribution, given H,_,; , of this statistic is not x? unless 

= Oorm — n — 1. For0 < r < m — n — 1, the asymptotic distribution, 
given H,_,, of this statistic is that of a weighted sum (with unequal weights 
of x? variates; in this case, the analysis of the a”! separate contingency tables 
is not in general as simple and straightforward as when r = 0 or m — n — 1, 
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since the usual methods of analysis of contingency tables cannot be applied to 
this case. This case will be discussed more fully in a later publication by the 
present author 


6. The exact probability formulas. An illustration will now be presented to 
indicate the difference between (12), (18), and the formula suggested in [4] for 
the probability of the specified m-tuple count, given the n-tuple count, in a 


linear sequence. Consider the special case a = 2,n = 1,m = 2,N 5, and 
fu = 0, fie = 2, far = 1, fo = 1. Thus fi. = 2, fe. = 2, fir 1, f.2 3. From 
(1), we see that r = 1, s = 2. From (6), the probability of the specified 2-tuple 
count fi; , given the 1-tuple count f.; and f;. and r, is 2/3. The circularized 2- 
tuple frequencies are fy = 0, fie = 2, for = 2, foe = 1. From (18), the prob- 


ability of the f,;, given the jf;, is 1/2. Using the approach suggested in [4] of 
applying (18) to the augmented circularized sequence, the probability of the 
specified 2-tuple count f,;, given the 1-tuple count f;, is 1/5; this approach 
yields a correct answer only if the chain is stationary. By listing all possible 
sequences for a = 2 and N = 5, the reader will see why different numerical 
results are obtained for the different probabilities. 
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TESTS OF MULTIPLE INDEPENDENCE AND THE ASSOCIATED 
CONFIDENCE BOUNDS! 


By S. N. Roy anp R. E. BarGMann 
University of North Carolina 


1. Summary. In this paper a test based on the union-intersection principle is 
proposed for overall independence between p variates distributed according to 
the multivariate normal law, and this is extended to the hypothesis of inde- 
pendence between several groups of variates which have a joint multivariate 
normal distribution. Mlethods used in earlier papers [3, 4] have been applied in 
order to invert these tests for each situation, and to obtain, with a joint con- 
fidence coefficient greater than or equal to a preassigned level, simultaneous 
confidence bounds on certain parametric functions. These parametric functions 
are, in case I, the moduli of the regression vectors: (a) of the variate p on the 
variates (p — 1), (p — 2), +--+ , 2, 1, or on any subset of the latter; (b) of the 
variate (p — 1) on the variates (p — 2), (p — 8),--- , 2,1, or any subset of the 
latter, etc.; and finally, (c) of the variate 2 on the variate /. For case II, parallel 
to each case considered above, there is an analogous statement in which the re- 
gression vector is replaced by a regression matrix, 8, say, and the “modulus”’ 
of the regression vector is replaced by the (positive) square-root of the largest 
characteristic root of (88’). Simultaneous confidence bounds on these sets of pa- 
rameters are given. As far as the proposed tests of hypotheses of multiple independ- 
ence are concerned they are offered as an alternative to another class of tests 
based on the likelihood-ratio criterion [5, 6] which has been known for a long 
time. So far as the confidence bounds are concerned it’ is believed, however, 
that no other easily obtainable confidence bounds are»ayailable in this area. 
One of the objects of these confidence bounds is the detection of the ‘culprit 
variates’ in the case of rejection of the hypothesis of multiple independence, 
for the ‘“‘complex”’ hypothesis is, in this case, the intersection of several more 
“elementary”’ hypotheses of two-by-two independence. 


2. Introduction, notation, and preliminaries. Case I, which deals with the 
question of independence among p normally distributed variates, represents a 
well known situation which has occurred repeatedly in applications. For case II, 
which deals with the question of independence between k sets of normally dis- 
tributed variates (where each set contains one or more variates), a number of 
potential applications has been described by Wilks [6]. In addition to the situ- 
ations mentioned by Wilks, an interesting application concerns the problem of 
‘unreliable measurement”’. If we consider the p; variates 7; (¢ = 1,2, --- ,k) as 
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different measurements of a physically identical quantity which, because of the 
inaccuracy of the measuring instrument, are not in perfect correlation, the pro- 
cedures outlined in sections 5 and 6 will study the independence or dependence of 
the k underlying “true” physical quantities. The “correction-for-attenuation” 
technique which is widely used in this situation has two serious drawbacks which 
the present method tries to overcome: (1) It assumes equal error variance for 
each fallible measurement, and (2) it makes use of a statistic, the “correlation 
corrected for attenuation”, which is not a correlation coefficient, for it may 
attain values greater than unity. The present method is free from these short- 
comings. The confidence bounds discussed in section 6 will then give an indica- 
tion of the maximum attainable degree of prediction if infallible measurements 
could be made. 

Suppose we have a random sample of size n + 1 from an N[{(p X 1), =(p X p)], 
with p < n. Then, denoting by S(p X p) the sample dispersion matrix, we know 
that S is symmetric and everywhere at least p.s.d., (and also p.d., a.e.). It is also 
well known that, a.e., there exists a one-to-one transformation from S(p X p) 


to T(p X p) given by nS = TT", where T is a lower triangular matrix with 
positive diagonal elements. Let /;; (¢ 2 7 = 1, 2,---, p) denote the elements 
of T, si; and o;; (si; = 8j:, 01; = oj:, 7,7 = 1,2, +++, p) denote the elements 
of S and &, and let s* and o” denote the elements of S* and =~’. Furthermore, 
let Tp.1,2,---,(p—1) 5 Tip—2)-1,2,-++,(p-2) 5 °° * y Ta-1, and re. denote, respectively, 
the multiple correlation coefficient of (p) with (1,'2,---, p — 1), of (p — 1) 
with (1, 2, --- , p — 2), and so on, and finally the simple correlation coefficient 


of (2) with (1). It may be noted that all except the last are non-negative and 
a.e. positive. These multiple correlation coefficients will be called the sfep-down 
correlations. Likewise, let 


ou 12 are 71,4-1 és 

(2.1) 6s-2,2,---,¢-1(1 Xt—1) = lowon--- o+-1, 1) - = 7 _ 

O1,i-1 O83 6-1 °°° O¢-1,6-1 
(fori = p, p — 1, ---+, 2) denote the population regression vector of (¢) on 
(1,2,---,%— 1) and 

su $12 ree Sie | 
‘ihe HT re 

Sic Sac °° ° | (Se te 
(fori = p, p — 1, --- , 2) denote the corresponding sample regression vector 


These regression vectors will be called the step-down regression vectors. 

Next, we will present the expression for the multiple correlation coefficient by 
treating it as a special case of a canonical correlation (which will be convenient 
for later purposes). We have 
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as 
811 “¢ 81, 4-1 
(9 9) 2 — = 
(2.0 P5-1,2,-06 1 = Sui [Sis * Sad ? 


$1,i-1 oo 8i-1, 4-1 


1 


| 


v 


for p,p — 1,+--, 2. Next we have, by using nS = T7", 
t 


nsi: = [ta ++ tidlta--- tel’ = DD tis, 


j=l 

z 3 1 tu 0 0 

5 : te t 0 

nisis >> Sia.) = [tare s+ tea) . . 
fogs °° «°°° «Ogee 
ind 

81) re $1, 5-1 ty O <=: 0 ty O --- Qo 
n = ‘ ‘ ; pate ‘ 
Ree = * Benes ‘ie3 * == Tee gn eae  * see tes st | 


It is then easy to check, by substituting the expressions (2.3.1) into (2.3), that 


i-1 i 1/2 
(2.4) Té-1.9,--- 4-1 = +> tij z és] 


j=l j=l 

for 2 pP,p—l1,---,2. 
Now, let us turn to the case of a (pi + pe + --+ + px)-variate (= p-variate, 
say) normal distribution and partition the population dispersion matrix, } a 


into 
211 212 Lik (pn) 
(2.5 S@xp=-| ™ aes (p2) 
Zin Zan °°: Zul (or) 
(pi) (ps) (pe) 


and the sample dispersion matrix, S, into 
S 


11 Sie ae Sir (pr) 


ee es t 

(2.6) S(p x p) = |°" ” ox | (ps) 
Sis So -++ Sup (p.). 
(pi) (pe) (pe) 


Regarding each submatrix as an element let us say that there are k “pseudo- 
rows” and k “pseudo-columns” in the matrices on the right sides of (2.5) and 


(2.6). 
Let By.1,2,.--,.-1 and By..2,....1 (fort = k, k — 1, , 2) denote the eed 


lation and the sample regression matrix of the (p,)-set on sie (pi-r + Din + - 
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+ p;)-set. These matrices are given by the expressions: 


Bi-n,2,---e-1(pi X Pi-r + Pine + ++ + Dr) 


(2.7) 


(2.8) 


Si 1,i-1 
fori = kjk — 1,---, 2. The @’s and B’s will be called the step-down regression 
matrices. Also let us denote by 


(é 1) (2) FS) 
(2.8.1) 0 *~ els ss: 1 = Cia 8 052 coe S$ (Pp 


vi 1 = = Cj-1,2,-- 


the p; characteristic roots of the matrix 


Su “oe Siin e Si 
(282) SiSuSu--Siad| -  --  - ; 
Si<-1 < Sinaia Sirs 
fori = kjk — 1,--- , 2. It will be noticed that these c’s are the squares of the 
canonical correlation coefficients of the (p;)-set with the (p; + pe + «++ + pi_y)- 
set and that, a.e., the inequalities in (2.8.1) will be strict. For any (p,)-set, all 
the p; canonical correlation coefficients (or rather, as will be seen later, the 
largest of them) will play the same role as r;.;,2,...,;-1 in the previous case. These 
will be called the sfep-down sets of canonical correlations. We are assuming here 
for simplicity of discussion, but without loss of generality, that the sets are so 
numbered as to make a Pp; = p:, (fort = 2, 3, --- , k). The matrix cor- 
responding to T in the previous case will be introduced in a later section. 
Sections 3 and 4 will be concerned with the first case, i.e., the case of a p- 
variate normal distribution; section 3 will describe a test of the hypothesis 
Ho:o;; = 0 (¢ ¥7j = 1,2, --- , p), and section 4 will present simultaneous con- 
fidence bounds, fori = p, p — 1,---, 2, on (B;-1 2,-++,i-18i-12-- 


1/2 
oe .«-1) (and on 
truncations obtained by deleting any 1, 2, --- , (¢ — 2) variates of the (i — 1)- 


set). 


Sections 5 and 6 will be concerned with the second case, i.e., that of a (p: + 
Po + -+- + px)-variate normal distribution; section 5 will describe a test of the 
hypothesis Ho: Da = 0(+j = 1,2,---,k), and section 6 will present simul- 
taneous confidence bounds, for? = k,k — 1, --+ , 2, onthe largest characteristic 
root of (B;-1.2...-,; ek aces 1) (and on truncations obtained by deleting any 
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1, 2,--+, (¢ — 2) of the sets (p,), (pe), --- , (pi-s)). It will be noted that, in 
case I, the variate (7) is independent of the variates 1, 2, --- ,i — 1 if and only 
if Bs .1 ee ee eee 0, and, in case II, independence of the (p,)-set on the 
[(p;), (pe), +++, (pi-s)|-set implies, and is implied by, the vanishing of the 
largest characteristic root of (8;.1.2.....; , ee 


3. Independence in the p-variate problem. 


3.1. Independence (in distribution) of the step-down correlations, under the 
null hypothesis. The joint distribution of the ¢,;’s, for general z. is well known 
and given by 


Pp P 
(3.1.1) const - exp | - : tr a rT’ | II 5° I] de. 


i=1 i2=j=1 


Among various proofs, a recent one is given in [1, 2]. Under the null hypothesis 


we have > = D,,,, where the right side denotes a diagonal matrix with 
elements o1 , o22, °** , Opp- In this situation, (3.1.1) reduces to 


p i P p i 
a | - 1 &s- U5 / os| Tle TLL ats. 
@ i=l j=1 i=1 i=l j=l 
It should be noticed that the ¢;,’s vary from 0 to ~, and the /;,;’s from — © to 
*«. Now a comparison with the expression (2.4) shows immediately that the 
ri12,--.-1'8 (¢ = p,--+-, 2) are independently distributed, and their joint dis- 
tribution is given by 


T £2 \ (i-3) /2/ 2 
const - II \T§-1,2,-- co lie (1 7 G48. +-< tay 


i=p 


(n—f—1) 


2 (2 
X d(ri.1,2,---,i-1). 


3.2. The proposed test and a reason for the allocation of component proba- 
bilities. The proposed test is as follows: 


» 
Accept Ho over / [riae....i1 S wl, and 
i 


=2 


Pp 
. 2 
reject Ho over U [ria2... i ul, 
i=2 
where u is given by 


» 
(3.2.2) II Phriias,---6-1 S z| pi-1,2,-- i= 0) =1l-a. 

To obtain u, proceed as follows: Take a trial value, 4; , say, between zero and one. 
Using this value for given n and for i = 2, 3, --- , p obtain, from the Tables of the 
Incomplete Beta Function, the probabilities corresponding to the individual fac- 
tors on the left side of (3.2.2); call these probabilities yz, y3, --- , yp, say; form 
the product of the y,’s and denote it by y. Proceed in the same manner for other 
trial values, ue, us , etc., and plot the u,’s against the resulting y’s. Then, on this 





496 S. N. ROY AND R. E. BARGMANN 


plot, uw in (3.2.2) is that value of the y,’s which corresponds to y = | — a, the 
preassigned confidence level. 


It is obvious that, if the same u goes with the different component regions 


[rg-1.2,---,:-1 S ul, the probability measures that go with these regions are all 
different. One reason why we make this kind of allocation of the different y,’s is 


the following: Notice that the acceptance region for Ho is the intersection (over 


t= p,p—1,--- ,2) of regions [rj.1.2,...,,-1 S uw]; but, for a given 2, [rj.1,2,...4-1 5 


u| is itself the intersection of regions of the type [ri-1a.2 1» = pl, where 


Bede * ** t= 


ri.ra2,---,i-1) denotes the (simple) correlation of the 7th variate with any linear 
combination of the variates (1, 2, +--+ ,2— 1) which includes, as a special case, 
the variates (7, 2, --- ,z7 — 1) individually. Thus, if we allocate the y,’s in such a 
way as to make u the same for each component region, we attach the same weight 
not only to the correlations between any pair of the observed variates but also to 
the correlations between each variate and linear combinations of some others. 
The reader will perceive that this allocation is not completely symmetric. While 
symmetry is preserved with respect to all correlations by pairs, the step-down 
procedure is asymmetric as regards the correlation of any variate with any 
linear combination of all the other variates. However, this is perhaps the best 
that could be done under this particular approach. It should be noted that, if the 
square of any simple correlation in the correlation matrix exceeds the value yg, 
we will have to reject the hypothesis of independence. If, however, the square of 
the largest correlation coefficient in the correlation matrix stays below yu, we will 
have to perform the step-down process in order to decide on acceptance or re 
jection of the hypothesis of independence. 


3.3. Relation to the likelihood-ratio test. Since the determinant of /, the 
correlation matrix, equals (1 — el ia yxa- ctl nes e) +++ (L— re.) 
a test based on the product of the complements of the squares of the step-down 
correlations is equivalent to the likelihood-ratio test. While the distribution of 
the determinant of FR is fairly complicated, even under the null hypothesis of in- 
dependence [7], its moments [6] are well known and easily obtained from the joint 
distribution of the correlation coefficients under the hypothesis of independence 
It can be easily verified that they satisfy the recurrence relations: 


pas = we I (1 - 


i=] 


From these relations it is quite simple to obtain the moments, hence the co- 
efficients of skewness and kurtosis, and, from Table 42 of the Biometrika Tables 
for Statisticians [8], we can obtain, at least for moderately large n, very good 
approximations to the desired percentage points of the cdf. Thus, for testing the 
hypothesis of independence, the determinant test is quite useful and closely re 
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lated to the step-down procedure presented in this paper. At the moment, how- 
ever, we do not know of any method to use this determinant test for the con- 
struction of confidence bounds on parametric functions without running into 
complicated non-central distributions, whereas the step-down procedure, as de- 
scribed above, can be immediately inverted for the purpose of constructing 
simultaneous confidence bounds. 


4. Confidence bounds associated with the test of independence for a p- 
variate problem. For shortness, let us now denote by just r; the T-1,2, +++ ¢—2 
defined 7 (2.3) and (2.4), by just 8, (with components Ba, Bi2,--- , Bi: 
say the 


Bi.1,2,..-.-1 defined by (2.1), and by just b; (with components ba , 

ba, *** 5 es , say) the b,.:.2,...,;-1 defined by (2.2). Assuming a general (sym- 

metric, p.d.) =, let us now transform the original variates 7; , %2,-°-:,27,toa 
’ * * . 

new set 2; , 22 , ++: , xp defined by 


1 0 “+ O}[ x 
ofl l Siete : 
(4.1) 3} =|—8:n —Bs = 0 || zs 


—Bsr — Bye —Bys ae —Bp.p1 1 Lp 


Then it can be verified by induction or otherwise that the new variates are un- 
correlated and hence that the step-down correlations of the new variates, 
rt 12,---,s-1, Say, (¢ = p, p — 1, -+- , 2) are independently distributed. With a 
joint probability, 1 — a, say, let us now make the simultaneous statements: 


(4.2) idkses ea ee (fort = p, p — 1,---, 2). 


We have already seen that, given u, we can easily find @ and also, given a, we can 
find u. In order to invert the typical component statement (4.2) and thus make a 
confidence statement on 8;.1,2,...,;-1, 1.e., on [Ba , «++ , Bi:a], we observe that the 
multiple correlation coefficient between x; and (zt, Ze git" , r¢-1] is the same 
as that between 2; and [z,, 22, --- , 2i-:], since the starred variates in the 
first square brackets are linear combinations of just the non-starred variates 
the first square brackets are linear combinations of just the non-starred variates 
in the second square brackets; this fact simplifies our calculation of the desired 
expressions in terms of the S matrix, i.e., the sample dispersion matrix of the 
original variates. We may now use the results obtained in reference [4] in con- 
nection with the confidence bounds on a (pseudo-) regression matrix of a p-set on 
a q-set (p S q) for a (p + q)-variate normal distribution. Let us take expression 


‘2 


3.2) from reference [4] and renumber it as 


9 cil? f 1/2 —1 1/2 \ 
(4.3) cier(BB’) Cmax s 2) Cmax ( S22 ) = Cmax(88’) = Crax(BB’) 


1/2 1/2 vl 
7 ACmax(S 1-2 )Cmax( S22 )y 
where B and 8 are the sample and population regression matrices of the p-set on 
° ° —l ’ : 
the g-set given, respectively, by B = Sy.S3: s and 6 = > 12 22 ; S}.2 is the 
sample “residual” matrix of the p-set on the gq-set given by S;. = 
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Su — SiS Sie; Se is, of course, the sample dispersion matrix of the q-set. 

Given a preassigned confidence coefficient, 1 — a, the value of \ in (4.3) can be 

obtained from the central distribution of the square of the largest canonical 

correlation coefficient, for which recursion formulas are available [2]. 
For p = 1 and q = 7 — 1, (4.3) reduces to 


’ 2 ’ en 2 /2 eal , 2 ’ 2 
(4.4) (bib;)"? — X(s;5 — 8; Sz2ys,)"? AES) S (8:8:)'"? S (bib,)' 


1 Cmax . 


1/2_1/2 1 


P saat 1 
+ AXA(Sig — SgSs—-18;) © Cmax (45-1), 


where b, and §; are defined in the opening paragraph of section 4, and 


s,(1 x u —-i= [si fis *** 9 1.il, 


$11 ack $1, i-1 


Sua G ~~ 1 Xi = 9) 


$1,i-1 ea Si-1,i-1 


and, finally, \ = +[u/(1 — nu) . where u is obtained by the procedure outlined 
in the sequel of (3.2.2). It then follows that the typical statement (4.2) = (4.4), 
and therefore simultaneous statements (4.2) for? = p, p — 1, --+ , 2 will imply, 
with a joint probability 21 — a, simultaneous confidence bounds (4.4) on 
6:8; forz = p,p — 1,-:-,2. 

In equation (3.1) of reference [4], we may put p = 1 andg = 7 — 1 and choose 
the vector d. given there in such a way as to make any one, any two, ete., and 
finally any (¢ — 2) components of ds equal to zero; if then we make the cor- 
responding transition from (38.1) to (3.2) given in reference [4], we will have, 
along with each typical statement (4.4) above, truncated statements where any 
one, two, ete., finally any (¢ — 2) components of $; and b; have been deleted 
without, however, disturbing the expressions that occur with A. Thus, statement 
(4.4) and the truncations mentioned above will result in 2'°"' — 1 joint con- 


fidence statements for given 7. Since 7 can take the values p, p — 1, --+ , 2, we 


will have, altogether, 2 iwe (2° — 1) = 2”? — p — 1 confidence statements with 
a joint confidence coefficient 21 — a. 


5. Independence in the (p, + p, + --- + p,)-variate problem. 


5.1. Independence (in distribution) of the step-down sets of canonical corre- 
lation coefficients, under the null hypothesis. Starting from (2.6) in section 2 we 
shall make a transformation from S to a partitioned triangular matrix, 7’, given 
by 


(p;) Tu eres 0 T1 
(ps) Tx ‘ cee 0 Tx 
nS(p X p) = . . 


(p)LTa Te -- Te jLTa 
(p:) (pe) (px) 
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The distribution of the elements of 7’, under a general an will be given by 


P s « 
(5.1.1) const - exp | - tr or | ll e* II] avs, 


i=] t2ji~1 


where p = pi t+ Pe t-+-+ pm, and Ty = T;; (i.e., triangular). Under 
Ho: >i; = 0 (46 Xj = 1,2, ---,k), (5.1.1) reduces to 


k 
const + exp | - : > tr Du (Ta,°:°,TiTa,°°°, P| 
i=! 


(5.1.2) 


Il tii Il aT ;;. 


i=l i2j=1 
For shortness, let us write the matrix (2.8.2) in the form 


sag “ -lol)/ ol oid 
(5.1.3) ‘5 Pe} P SiS . ’ 


tt 
where 


if. Ss =? B..38 


Oh Mexntar thas Be Gad 


(pi), 


and 


etnaneiniiltpamsaeapaiiaaiammaiantioiets iil intitle nie Su — S1,:-1 

Sin(p, + -°° + pie K D1 + °° + Di-1) = g ese . 

Sia Sinaia 

Also, let us denote the p; characteristic roots of (5.1.3), ordered from the smallest 
to the largest, by 


(1) (2) (p4) 
[e; aia aS a’ .L 


It then follows directly that 
(5.1.5) nS = ([Ta--: TMT a 
Tu 0 
nS = [Ta --+ Tia) sie si 


T 5-11 Tin. is y ae 


- © 0 FT. @ «+. 0 
nS... ; See ‘ i wae ; 
Tina r ada : ee Tina ~ 5 iain 7-1, 
Substituting the expressions (5.1.5) into (5.1.3) (or (2.8.2)), we see that (5.1.3) 
reduces to 


i —1 t—1 
(5.1.6) [> Ts | | Ty rs]. 
j=l j=l 
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Now, since by (5.1.2) the sets [Ta --- T;.] are independently distributed under 
the null hypothesis, for 7 = k, k — 1,--- 1, it follows that the sets of char- 
acteristic roots of (5.1.3), viz., 

[c2" aL (cs yore 
are independently distributed. 


5.2. The proposed test of independence. We propose the following test: 


k 
Accept Ho over MN [c(?” <j, and 
i=2 


k 
reject Ho over U [c?” > x], 


where } is given by 
II Plc’? < x\y?° = 0] =1-a, 


and vy denotes the largest characteristic root of Dis a 2a 7 ” where 
the >>;,’s are obtained from > in exactly the same way as the S,,’s from S in 
equations (5.1.4). It will be noted that y{”" is zero if and only if, for given 
t, Doi = 0, for7 = 1,2, --- ,i — 1. Analogous to section 3.2., we take the same 
value of \ for each factor on the left-hand side of (5.2.2); the reason is the same as 
that given in section 3.2. The procedure for obtaining \ is analogous to that given 
for w in section (3.2.) except that the incomplete Beta function needs to be re- 
placed by the central distribution function of the (square of) the largest canonical 
correlation coefficient. The distribution and recursion relations for particular 
values are discussed explicity in reference [2]. 


5.3. Relation to the likelihood-ratio test. Denoting the jth canonical correla- 
tion coefficient of the p,-set on the (p; + pp, + --- + pi_s)-set by rite, 
we see that 


++,t—l » 


~ « \ ;)2 (i) 1 . ie cal nit 
(5.3.1) .- ri72,-- 4 cS (Si =" : Ss S 7 


a t—14 


cOIR (Ra — RO" RAR"), 


where c”’ denotes the jth characteristic root, and the R’s are the sample cor- 
relation matrices corresponding to the covariance matrices, S 
Thus, 


Pi 9 |R 1 
— 7222, °r Te 
I (1 — rétis,....s—) | Ris] | Real’ 


and the product of the products of all step-down canonical correlation coefficients 
(or rather, of the complements of their squares) becomes 


14 (i)2 | R| 
(5.3.3) TUT a — rit...) = = 


: * | ’ 
i=—2 j=1 gull | R;; j 
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because |R; Ry,\. Thus, a comparison with reference [6| shows that a test 
based on the product of products of all step-down correlations is closely related to 
the likelihood-ratio test. The distribution of this statistic, under Ho , is discussed 
in [7], and the moments are given in [6]. They can be readily obtained if, in the 
joint distribution of all r;,’s (for a gene ral matrix (¢;;) = R, say) 


pr (np/2) a 
dP 
‘TI rj - i + 1/2) 
re = dpi-+-dBpn_ 
x I. _ [. (rR Dg RDg\”” . IL. dris, 


(where Dg is a diagonal matrix with elements e 





P(R| R) 


0 
fi 


The moments satisfy the convenient recurrence relation: 


a = de (m= +) 
oe | et & 
Tj-: [12 Pii(n —i +1)’ 


1 = be [[2-1 (n + 2a — g + 1) 
“TE Tl? (n+ 2a —i + 1) 


Thus, for testing the hypothesis of independence between k sets of normally dis- 
tributed variates, the determinant test is quite useful and closely related to the 
step-down procedure. At the moment, however, we cannot easily construct 
simultaneous confidence bounds on parametric functions on the basis of the 
determinant test, whereas the step-down procedure can be inverted into a 
simultaneous confidence statement. 





6. Confidence bounds associated with the test of independence for a 
DP; + Pe + -+- + px)-variate problem. Using (5.1.4) let us rewrite §j.1,2,. ...-1 in 
(2.7) and B,.;.2,..,;-1 in (2.8) as 


(6.1) Bilpi X py + Pe + --> + pi-s) = zon, 


and 


(6.2) BAp: X (pp + pe + °° + pis) = S'S. 
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Next we partition 8; into 
[Bin Bw --> f 
(p1) (pe) 
and B; into 
[Ba By --- 


Assuming now a general (symmetric, p.d.) 4°, let us transform the original 
‘ , ° ° ° * 
variates x:(p; X 1), Xe(p2 K 1), +--+ , Xx(pe X 1) intoa new set of variates xy , 
* * » 
Xo ,-::, Xx; defined by 


x: | (p;) I(p;) 6 o@ -: 0 0 ¥ 
. (pe) — B82 I (pe) 0 2 ft) 0 A? 


, x 
(6.3) = 


x; (px) — Bi — Bx . wane —Gs.e—1 I (px) | | 


Then it can be verified by induction or otherwise that the k sets of new (starred 
variates are uncorrelated, and hence the step-down sets of (squares of) canonical 
correlations, [c2”, --- , c2°”?’J, [c3°°, ---, co? J, °°, (ce° (Pe)| 
independently distributed. With a joint probability, 1 — a, say, we may thus 
make the simultaneous statement 


( are 


(6.4) c: = Xr (for 1 = k. I: — : eee 2 


, 


Analogous to section 4, with the modifications given in 5.2., we can find X if @ is 
preassigned. By the same argument as in section 4, and by using (4.3) we can 
obtain, with a joint confidence coefficient 21 —a, the following sets of simul- 
taneous confidence bounds, for? = k,k — 1, --- , 2: 


(6.5) &2(BB) — 2218. — 8" 82,8 122182) s 22188) 
< i(BB:) + AcmalSi — S°’ SAS cE (SA), 


where 8; and B; are defined by (6.1) and (6.2), S*” and S,_; by (5.1.4), and 
>” and >> ;-1 analogously. Following the argument presented in section 3 of 
reference [4] and in section 4 of this paper we see that, with a joint confidence 
coefficient 21 —a, not only can we make the (k — 1) statements (6.5) but, for 
each typical statement under (6.5), we can also make a number of truncated 
confidence statements by deleting any number of variates of the (p,;)-set and any 
number of variates of the (p; + pe + --- + pi_;)-set taking care only that the 
number of variates left in the (p,; + --- + pj_;)-set is not less than that left in 
the p,-set. 
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SEQUENTIAL TESTS FOR VARIANCE RATIOS AND COMPONENTS 
OF VARIANCE’ 


By ALLAN BIRNBAUM 


Columbia University, New York, and Imperial College, London 


1. Introduction and summary. A general sequential sampling method is given 
for problems of comparing the variances of two or more normal populations in 
terms of ratios of variances. Sequential tests are given for a hypothesis specifying 
the ratio of two variances, including tests for variance components in the analysis 
of variance (Model II). Such tests provide savings in average required numbers 
of observations, relative to standard F tests, comparable to those typical of 
sequential probability ratio tests. 

2. Basic sequential sampling rule. Let x’ %,%2,°-:)y = (Ya, ¥e> 
be sequences of independent observations from two norm: al populations with un 
known means and unknown respective variances o; , a, . Let 


vy 


n 
Ln ———— oe SN ne ee 
V n(n + 1) V nin + 1) 1) V nin + 1) 


forn = 3,4,---.Letm, ye, --- be similarly defined as functions of vi . Ye _ eiane 
Then z = (%, %,°**), ¥ = “a , Ye, +++) are sequences of independent i, 


servations, normally distributed with zero means and respective variances 


9 9 

oz ,o,. (Incase the original observations are from populations with known means 

equal, say, to zero, x and y will denote the sequences of original observations.) 
Let g be a given positive number, and let 


9 9 
“u= (m,uw,°:-) = (gz gx2 , 9X3 + 9X4, °°: and 


2 2 


Ye, °°?) 


+ 


halts wh «De. for i = 


j=l 


j=1 
Let T = (T,, T2,--- ) be a nondecreasing sequence whose elements are those 


of ea) U (Se . Since events of the form R; = R;, S; = S;,7 #j, and R; = 8S; 


Received December 14, 1954; revised September, 1956. 

1 Work sponsored under the Office of Naval Research, Contract N6onr-271, T. O. XI, 
Project 042-034. Reproduction in whole or in part is permitted for any purpose of the 
United States Government 


504 





SEQUENTIAL TESTS 505 


have total probability zero, we have with probability one that 7 is uniquely 
determined, 


T; min (Ry ’ Si), 


(min (Rz, 8) 
min (R, , S) 


Given T, let B = (b, , bp , --- ) be defined by 


f 


| @ Ws some R; , 
\O if T; some S;, fort = 1,2, --- 


bh as < 


The statistical decision procédures to be described below are based on observed 
values 6; only. 

A simple rule for sequential sampling of z,’s and y,’s so as to obtain the values 
h, is the following 


Sampling rule 1: 


1. Observe uw and v, (that is, observe 2, 22, y: and ye, and compute 1% = 


9 9 


g(x; + 2x3) and 2; (yi + y3 

If an additional observation b, is required, then if u; < v,, observe uz ; 
if uw, => v,; , observe v2. 

Similarly at every stage, if an additional observation b; is required, then if 
dou, < Sov, , observe an additional u; , while if }>u; = Sov, , observe an 
additional 1 

Discontinue sampling when the observations b; , --- , 6» thus far obtained 
suffice to determine a decision according to the particular decision proce- 
dure being used. 


[t is clear that to obtain m observations 6; , --- , 6, , a total of m + 1 observa- 


tions u; and v,; are required; thus the above rule is most efficient for sampling 
;’s and v,’s to obtain b,’s. 


\ minor gain in efficiency here becomes possible if the sampling rule is described 
in terms of z,’s and y;’s. This follows from the following observation: If only 
1. %. and y; have been observed, and are such that 


2 2 2 
w=H9i+rrm)<n Sn, 


then u,; < v; and b; = 1 are known without need to observe ye. Similarly if 
xi > ¥i + Y is observed, b; = 0 is known without need to observe x2. It is 
clear that the determination of }b; in this way (that is, by observing whether 
u; < v;, Without observing the exact numerical values of both w and v;) does 
not alter any mathematical or statistical properties of b; , since b,; is defined in 
terms of an inequality in wu; and v; . Analogous observations hold at each stage 
of sampling, and are the basis for the following 


Basic sequential sampling rule: 


1. Observe 2; and 7; . 


2 2 *¢ sw 2 
2. If gx; < yj; , observe x ; if gxj = yi , observe yo. 
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TABLE 1 


Example of sampling rule (with g = 1) and computation of b;’s. 


Observed values, in order of sampling 





2 
zi 


2 
T1 


w= 


Crt W 


oO 


CO =I 


by 





Similarly at every stage, letting dori and dvi denote summations over 
all observations thus far obtained, if }-gri < }°yj, observe an additional 
zi; if doxi = Divi , observe an additional y; . 
Discontinue sampling when the observations }, , --- b,, , thus far obtained 
suffice to determine a decision according to the particular decision proce- 
dure being used. 
A convenient tabular method for carrying out the sampling and computing 
the required b,’s is illustrated in Table 1, for the case g = 1 of the sampling rule. 
The computation of b,’s may be described thus: As soon as 2r or more 7,’s 
have been observed and }gri —>oyi < 0 is observed, the r‘* unity value of 
b; is observed. As soon as 2r or more y;,’s have been observed and > 9x% 
dvi = 0 is observed, the r zero value of b; is observed. In applications where 
population means are unknown, the relation 
n . n+l ‘ 1 n+l 2 n+1 a n+l 2 
See (e- 4 Ex) = Deo - (Ez) /msn 
1 1 n+1°'I I I 
allows simple application of the method directly to the original observations. 
It is readily verified that this rule minimizes the number of z,’s and y,’s which 
must be observed to determine the required b,’s. Since this rule requires at least 
(2m + 1) and at most (2m + 2) observations 2; and y; , it affords a saving of at 
most one such observation (and in terms of expected number of observations, a 
saving of a fraction of one observation) as compared with the preceding rule. 
Hence rule 1 may often be preferred because of its greater simplicity. How- 
ever only the basic sampling rule will be considered in the following sections. 
Clearly the sampling rule depends on the given value of g. Criteria for the 
choice of g are discussed below. 


3. Distribution theory. $ uw, 3 w2,--- are independently distributed with 
common density function 


fw) = Fen, 
9oz 
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Similarly 3v; le, ste independently distributed with common density 
function 


h(w) = 
Oy 
Hence (cf. [1] or [2] 

(a) The sequences wu and »v are distributed as the “‘waiting times’ between 
successive events in two independent Poisson processes with respective 
parameters 

1 


1 
Ai = a3 and 2 = 5.2 ° 
290; 20, 


If two such processes are observed simultaneously as one process, the 
new process is Poisson with mean A = A; + A, and waiting times 7; , T; — 
T:,T; — T2,--- 
b 1 (i.e. T; = some R;) denotes that the 7” event observed in the new 
process occurred in the first of the two original processes. 
d) The },’s are independent, with 


—} 2\-1 
p= Prob t= 1) =(1+™¥) =(1+ 9%) 
1 oy 


for all 7. 

Thus we may apply to B any sequential or nonsequential methods for statisti- 
cal inferences concerning a binomial parameter p. By use of the relation o/s, = 
(1/g) (1/p — 1), any interval estimate of p provides an interval estimate of 
o/c, , and any procedure for testing a hypothesis on p provides a procedure 
for testing a hypothesis on o2/c;. An unbiased estimate of o2/¢; is given by 
(1/g) (1/p — 1), where 1/) is an unbiased estimate of 1/p based on inverse 
binomial sampling of b,’s. ; 

The generalization of the basic sampling rule and distribution theory to the 
case of comparisons of three or more variances is immediate. In this generaliza- 
tion, the distribution of b,’s would be multinomial instead of binomial. 


4. Efficiency. A number of questions concerning the efficiencies of tests based 
on the above sampling rule are discussed in the following sections. 


4a. Comparisons with standard F-tests. The standard method of testing a 
hypothesis Hy : 02/0, = po, for any given po > 0, is to take fixed numbers n., 
n, of observations 2; , y; respectively, and to use the statistic 


n 2 

F = Ny > i=1 Ty 
= a 
Nz Po > ye Yi 


which under Ho has the F-distribution with n,, n, degrees of freedom. Tables 
8.3 and 8.4 of [3] give the operating characteristics of such tests. For example, 


2 


to test Hy :02/o; = .3404 against Hy:0%/0%3 = (.3404)" = 2.938, with a = 


y 





508 ALLAN BIRNBAUM 


Type I error = .01 and 6 = Type II error = .01, a total of n. + n, = 40 ob- 
servations suffices provided nz = n, = 20. If the above sampling rule is used, 
taking g = 1, then Hp is equivalent to 


Ho:p = Prob {b; = 1} = po = (1 + .3404)" = .746 


and H, is equivalent to Hi:p = p, = (1 + 2.938)" = .254. By use of binomial 
probability tables [4] we find Prob ini b; = 11|p = .26} = Prob in 
b S 10|p= 74} = ,0088; thus a nonsequential test of Hs against Ht based 
on 21 observations b; has 


a= 6 < .0088 < .01 


The sampling rule requires either 43 or 44 observations 2; , y; to generate 21 
observations b;. Thus the efficiency of the standard F-test is approximately 
matched by the nonsequential test based on b,’s in this case. (More exact com- 
parisons of efficiency can be made, for example by computing the required exact 
untabled binomial probabilities and using randomized binomial sample sizes, 
but this does not seem necessary for present purposes.) Comparisons of this 
kind are given for a number of other cases in Table 2 below. The properties of 
the four F-tests are taken from [3], with n, = n, in each case. The approximately 
matching non-sequential binomial tests are each based on the case g = 1 of the 
sampling rule; m is the required binomial sample size in each case. Since for m 


TABLE 2 


Approximately equivalent non Value of 
sequential! binomial tests 


° ° 
Ks 
z/ L 


Standard F-tests 


—— — he tT My 


Value of o2/e? Strength Value of p Strength = 2m \correspond 
ccniacien isis alee + 3 | ing exactly 


oo ae Har ny i , , ¢ 
He 8 Ho | 3 o to Hy 


Test 1 |.3404 2.93 ; ‘ .2¢ | .0119) . 9 43, 2.704 


.4707| 2.13 ‘ 67 33 0520 30 


Test 2 ols .05 5 .¢ lg .0318) .0537 39. .667 
.0835, .0532 ; 4.263 


0577, .0537 . .555 


.0541) .0597 Y.g 846 
.0403, .0544 al 3.000 
.0541) .4850 c a: .703 
.0403, .4807 81. .778 


on on on on 


.0119) .0099 179.5 | 4.882 
.0083 .0124 
0119 .0505 
.0083 .0432 
.0019, .4889 
.0083! .4777 





SEQUENTIAL TESTS 509 


observations b; the sampling rule requires 2m + 1 or 2m + 2 observations 2; , y: , 
2m +3/2 (= nz + n,) is given for each binomial test for comparison with the 
nz + ny required by the corresponding F-test. In each case investigated, the 
F-test is approximately matched in efficiency by a nonsequential binomial test 
based on 6,’s. 

The F-tests are no doubt preferable to the nonsequential binomial tests, but 
evidently simplicity of application is the only important basis for this preference. 
Sequential probability ratio tests [5] are directly applicable to the b,’s. Such 
tests of Ho:p = po against Hy:p = p~: > po require average sample sizes of 
approximately m/2 or less when p S po and when p 2 p;. Thus by use of the 
above sampling rule, with application of a sequential test to the b,’s, gains in 
efficiency over the standard F-test are obtained. These gains can be calculated, 
to close approximation, by use of the average-sample-size function for a sequen- 
tial binomial test on b,’s corresponding to any given F-test. 

Two-sided sequential tests on o:/¢, based on the b,’s would require two-sided 
sequential probability ratio tests on a binomial parameter. Such tests are not 
yet available, but can be constructed by application of the method of sections 
4.1.2 and 4.1.3 of [5]. 


4b. Choice of the scale-factor g. Each of the binomial tests considered in the 
preceding sections was based on the particular case g = 1 of the basic sequential 
sampling rule. The efficiencies of such binomial tests will in general be still 
further increased by suitable choices of values of g. When a problem of testing 
a variance ratio p = o: a is specified by given values of po, p1, a, and 8, it is 


natural to define a best value of g as follows: Consider the problem of testing 
Ho:p = (1 + gpo) ’ against Hy:p = (1+ q>p,)', at strength a, 8. Let n(g) be 
the binomial sample size required for a nonsequential test of strength (at least) 
a, 8. Then a best value of g may be defined as one which minimizes n(g). The 
calculation of an optimal value of g, by use of binomial: probability tables, is 
elementary. 

In the case a = 8, symmetry considerations suggest that g = (pop:)” is a 
best value; the same conclusion can be reached more formally by use of the 
normal approximation to the binomial probability a = 8. This case occurs in 
some of the examples above: For example, to test Ho:p = .4707 against Hi:p = 
(.4707)' = 2.124 at strength a = 8 = .05, the binomial test based on g = 1 
requires the (non-sequential) sample size m = 21. Taking the non-optimal value 
g = 21.24, the corresponding binomial problem is one of testing Ho:p = .09 
against Hj:p = .02 at strength a = 8 = .05, for which a (non-sequential) bi- 
nomial sample size m = 50 is required. 

If only po and p; are given, it is interesting to consider whether a value of g 
exists which is best in the above sense simultaneously for all possible values of 
(a, 8). This is a problem in the “comparison of experiments” [6, pp. 334-6]: 
For any po, p: (p:1 ¥ po) and any two values g; , g2 of g (g: ¥ g2), the “binomial 
dichotomy” experiment F, , testing Hj?:p = (1 + gipo)” against H!?’:p = 
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(1 + gip:)', can be shown to be “not comparable with” the experiment EF, , 
testing H? 7p = (1 + g2po) against HY >p= (1+ gop). It follows that a 


best value of g depends on the desired strength (a, 8) as well as on po, pi in any 
particular testing problem. 

While the value g = 1 may not be optimal in those testing problems of Table 1 
in which p; ¥ po or a 8, it is evident that this value suffices to provide the 
savings in expected numbers of observations pointed out in Section 4a above. 


4c. Comparison with other sampling rules. It is of interest to compare the 
above sampling rule with the one considered by Girshick in [7, pp. 134-6}. 
Girshick’s rule is: observe x; and y; ; if more observations are needed, observe 
x2 and ye ; continue taking such pairs of observations (x; , y;) as long as required 
to terminate a particular inference procedure. (For non-sequential F tests of 
the hypothesis Ho:02/c, = po < 1 against H,:02/0; = po’, with a total of 2n 
observations and equal Type I and Type II error rates, nz = n, = n are optimal 
sample sizes. This case is formally the instance of Girshick’s sampling rule in 
which the observation of exactly n pairs (x; , y;) is prescribed.) Girshick gave an 
optimal sequential probability ratio test based on this sampling rule for deciding 
which of the two variances o; , ¢, is the larger, and showed that the power func- 
tions of such tests depend just on (1/oz — 1/0;), a parameter which is not of 
interest in most applications. This suggests that in order to obtain a test whose 
power function depends just on the variance ratio (which is generally the param- 
eter of interest), we must 

(a) use Girshick’s sampling rule and apply a test other than Girshick’s, which 

must then lack certain efficiency properties of the sequential probability 
ratio test, or 

(b) use some other sampling rule as a basis for a test. 

Rushton [8] and Johnson [9] have given procedures which represent alterna- 
tive (a). Johnson’s Procedure I is based on Girshick’s sampling rule and the 
sequence of statistics 


» ‘ ws 
Ly ti + X2 it 
— yee we eee, 


2 eS 
2 1 Yi 


Yi wt+y 


and consists of a sequential probability ratio test applied to this sequence of 
non-independent statistics. The average sample sizes required by this procedure 
are not known. Johnson also gives some alternative procedures based on Gir- 
shick’s sampling rule which are evidently less efficient but have approximately- 
known average sample sizes. 

Alternative (b) is represented by the sampling rule and tests of the preceding 
sections. 

This comparison of sampling rules illustrates a general problem of designing 
sequential sampling rules which are appropriate and optimal for various problems 
of testing composite hypotheses. Other illustrations are provided by various 


procedures for comparing Poisson processes [2]. Some general methods for the 


design of appropriate sampling rules will be given in another paper. 
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The above sampling rule and tests illustrate a remark in [2, p. 257|:‘‘--- prob- 
lems dealing with variances of normal populations have direct analogues in 
problems dealing with parameters of Poisson processes. --- ’”’ The above methods 
are analogues of Methods 1-3, pp. 261-2, in [2] 


ms, =|. 


4d. Near-optimal properties. Consider the problem of testing Ho:02/o; 
= po against H:0:/0, = pi, pi: > po> O, at a given significance level 
a. Suppose that sequential sampling is conducted according to the above 
basic sampling rule, with any given value of g > 0, and any given termination 
rule which with probability one leads to termination. Then, relative to the given 
sampling and termination procedure (i.e. relative to the corresponding given 
sample space), the present problem may be viewed as one of nonsequential 
testing between two composite hypotheses, on the basis of a single (vector) 
observation. The parameter space consists of points (p, 7), where p = a: 
o,, and 7 = o,. 

Consider the simple hypothesis Ht :(p, 7) = (po, 70), and the simple alterna- 
tive Hf :(p, r) = (p:, 71), Where 71 = 70 (po(p: + 1))/(pi(po + 1)), and 7» has any 
fixed positive value. By the Neyman-Pearson lemma, every best test of H? 
against Hf} has a critical region of the form 


W, = {(z, y) | A(z, y) 


for some given k = 0, where 


* 
f(a, *** Zags Yt» °** Yu, | Ai) 
. J + 
f(z1, -** Zag Yr» °** Ya, | Ho) 





A(z, y) =" 


is the ratio of likelihood functions. Such a test has significance level a* which, 
when k is suitably chosen, equals the prescribed a. This test then has power 
1 — 8* under Hf which is the maximum attainable under the stated restrictions. 

On the other hand, a best test of H? against HT based ‘only on the observed 
values of b,’s provided by the given sampling procedure has a rejection region 
of the form 


W, = 


ZT 8; = m—LT b; 
X'(z, z ) (”) G—*) 
Po 1 — po 


where 


is the likelihood ratio of the observed b,’s, with 
p; = Prob {b; = 1 H}} = (1 + gp)”, j= 0,1. 


The purpose of the present section is to show, in a qualitative and heuristic 
manner, that under appropriate restrictions 


Prob {W; | Hf} = Prob {W;,| H?} for j = 0,1 
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and any 7o, which implies the approximate optimality of a test based on b,’s 
only, for the given sequential sampling and termination rule. The following 
discussion could be formulated more quantitatively, but this seems unnecessary 
since 
(a) the tests based on b,’s have simple and known properties, 
(b) the operating characteristics of tests based on A(x, y) would apparently 
be difficult to determine exactly and more difficult to apply than the test 
on b,’s, 
(ec) a qualitative indication of the approximate equivalence of the tests serves 
practically as an additional recommendation for use of the tests based 
on b,’s. 
Now 





(27po To) 


7 
po + "(a 1 | (3: 2 . :)/( po — pi )| 
={- ———< " — ri- ; } (| ——————~ } |. 
(2: + 3 po(pi + 1) eet 2 x Yy To po(or + 1)/ J 


Now the case g = 1 of the basic sampling rule is such that at every stage of 
sampling the quantity (“xi — }¢y%) will be increased if it is negative and will 
be decreased if it is positive; thus this quantity will under all hypotheses have a 
distribution concentrated near the value zero, and the exponential factor of 
A(x, y) will tend to have a value near unity. For cases g ~ 1 of the sampling 
rule, a change of scale in x; values, 2; > +/gzx,, before computation of A(z, y) 
gives the same result. We continue the discussion just for the case g = 1. 

Thus if the rule for termination of sampling is such that nz + n, is not small, 


( ae 1) ny/2 
{A(a, y) = k| Hf} = Prot atiy (et) > 
Prob {A(z, y) 2 k| HA; I + Prob { (#211 7 so k| H} 


and the last quantity is independent of the “nuisance parameter” 7. , for 7 = 0, 
From the definition of the b,’s we readily obtain 


m 


m 
Ny , N: 
a aie <_ mn < ly ‘ ne — 
- l1<sm > b S= and 5 l , 'b; < 


} 
35° 


Since 


forz = 0, 1, we have 


N(x, y) = (? ny" ‘( = er 
Po 1— Po 
(att - )- “(eft sy 
pi + po(p: + 1) 





SEQUENTIAL TESTS 
Hencs 
= j . , %, « p i / , *, 
Prob {A(z, y) > k H; } = Prob iX\'(z, yy) > } H; | 
for 7) = 0, 1, and for each value of the “‘nuisance parameter’’ 75 


5. Sequential Tests on Components of Variance. The testing problems arising 
in Model II of the analysis of variance, and their usual non-sequential solution 
based on F tests, are described, for example, by Mood in [10, Chapter 14). The 
method of the preceding sections can be adapted to provide sequential tests for 
such problems, as indicated below. Such sequential tests provide savings like 
those described above in the required numbers of observations 

Consider the “one-way layout” problem in which 


for? = 1, 2, 7 = 1,2 


Here uv is an unknown constant, a, and e,, are normally distributed random vari 
ables with zero means and unknown respective variances of and o?, and all 
a;’s and e;;’s are mutually independent. The statistical problem is to test a 
hypothesis specifying the value of p = 0% / 0% . For present purposes we consider 
that a doubly infinite array of the random variables y,; is available, and that we 
are free to take observations y,;; throughout this array in any manner. Let 


denote the unique ‘‘doubly infinite orthogonal matrix’’ characterized as indi- 
cated, i.e., by the requirements that ¢;; = 0 if 7 2 i + 2, that tf, > 0, and that 
Diet ts; = 0, for all 7, 7. Let the random variables (r; , r2, --- ) be defined by 


(%.fe,°*:) = T(yu, Y2,°°° ys 


i.€., Ta = Dojo tas; , fora = 1,2, --- . Let therandom variables (s; , se , 
be defined by 


, = T (12: “hale 


+1 ° ‘ — ; ’ ° 
i.€.,Sa = > jar tajeia fora = 1,2, ---. Thenall r,’s and s,’s have independ- 
Seager s : : oo 
ent normal distributions with zero means; and Var (r.) = o, = o,, Var (s,) = 
2 :. » 
Ge = Oo T Ge- 


The sequential test procedures given above can be applied directly to the 
, : ; 2,3 
sequences of r,’s and s,’s to test any hypothesis on p = o,/ o,, sav Hoip Po 
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against Hy: p pi > po, at specified size and power. But since 


2 2 ‘ 
CO. + oO” Ta 
gs = 1+, 


Ge Se 


. : *%, 2,2 : : * 2,2 

Hy is equivalent to Ho :oa/o. = po — 1, and H, is equivalent to H; : oa/o, = 
i ° . 2 x. ° . rr 

pi: — 1. (Here pp = 1 is required if Ho is to be meaningful.) Thus the usual hypoth- 

eses of interest in terms of variance components, namely those specifying a 


ene 2 2 “ 6 2 
positive value for 02/02 = po — 1, and those specifying og = 0 (and hence po = 1) 


can be tested sequentially with gains in efficiency as described above. 
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SUMS OF POWERS OF INDEPENDENT RANDOM VARIABLES! 


By J. M. SHaprro 
Ohio State University 


1. Summary and introduction. Let (z,.),k = 1,---,kn;n = 1,2,--- bea 
double sequence of infinitesimal random variables which are rowwise inde- 
pendent (i.e., lim,.. maxi<i<k, P (| tax | > €) = 0 for evey « > 0, and for each 
N Ini, °** » Unk, are independent). Let S, = rn, + --- + nz, — An Where the 
A,, are constants and let F(x) be the distribution function of S, . Necessary 
and sufficient conditions for F(x) to converge to a distribution function F(z) 
are known, and in particular we know that F(z) is infinitely divisible. 

In this paper we shall investigate the system of infinitesimal, rowwise inde- 
pendent random variables (| z,. |"), 7 2 1. In particular we shall be interested 
in large values of r. Specifically, let S, = |aa|" + --- +)au,|' — B,(r), 
where B,(r) are suitably chosen constants. Let F(x) be the distribution func- 
tion of S),. Necessary and sufficient conditions for F(x) to converge (n — ~ ) 
to a distribution function F(x) are given, and also necessary and sufficient condi- 
tions for F’(x) to converge (r — «) to a distribution function H(z) are given. 
The form that H(x) must take is obtained and under rather general conditions 
it is shown that H(x) is a Poisson distribution. In any case it is shown that H(r) 
is the sum of two independent random variables, one Gaussian and the other 


Poisson (including their degenerate cases). 


2. Notation. Let F(x) be any infinitely divisible distribution function with 
characteristic function g(t). According to the formulas of Lévy and Khintchine 
(ef. [1]) we know that ¢(¢) has the following representation: 


a? 


exp<iy(r)t — 30 + | (e"“ — 1) dM(u) 
— 


_ 0 


| (e™ — 1) dN(u) + (e“ — 1 — itu) dM(u) 


vr ut 


+ (eo — 1 — ttu) adN(u), 
“0+ 


where M(u) and N(u) are respectively nondecreasing functions in the intervals 
(—«,0), (0, +2) which satisfy M(—«) = N(+~x) = O and Ce u’ dM(u) + 
fo u’ dN(u) < & for every « > 0; ¢ is a nonnegative constant; r and —7r are 
continuity points of N(w) and M(u); and y(r) is a constant depending only on r. 
It is well known that the distribution functions F’(x) and H(z) referred to in 
Section 1 are infinitely divisible, and throughout this paper we let M’(u) and 
Received August 28, 1957; revised November 12, 1957 
' Presented to the American Mathematical Society August 29, 1957. 
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N‘(u) be associated with F’(x) and M*(u) and N*(u) be associated with H(z), 
through the formulas given for their characteristic functions analogous to (2.1). 

We let Fy.(x) and F(x) be the distribution functions of 2, and) ry)” 
respectively. 

When speaking of a random variable (or its distribution function) being 
Poisson we shall mean it is either Poisson or its degenerate case (i.e., a random 
variable taking the value 0 with probability 1). The same applies to a Gaussian 
random variable) in this case the degenerate case is a random variable taking the 
value m with probability 1). 

If K(x) is a nondecreasing function when we write lim,.. K,(z) = K(zr) it is 
understood that this need only hold at continuity points of K(z). 


3. General results and proofs. 

THEOREM 1. Let lim,.< Fa(xz) = F’(x) forr = ro = 1 and lim,... F(z) = H(z), 
where F’(x) and H(x) are distribution functions. Then H(x) is the distribution 
function of the sum of two independent random variables one of which is Gaussian 
and the other Poisson. 

We remark that Theorem | remains valid if we assume lim,.. F(z) = F(z) 
for some sequence of values of r becoming infinite in place of this condition hold- 
ing for r = 7%. 

The proof of Theorem | requires the following lemma. 

Lemma 1. If we add to the hypothesis of Theorem1 the condition that lim... 
F(x) = F(x), the conclusion of Theorem 1 holds. 

Proof. Since lim,.« F,(z) = F(x) by Theorem 1 on page 116 of [1], we see 


kn 
lim Zz F,,(z) = M(z), e < G, 
neo k=] 
and 


kn 
lim 7 (F..(z) — 1) = N(z), 
n~o k=] 


where M(x) and N(x) are given by (2.1). Now for a 2 0, 


l/r 


Far(a) = P(|au|" S a) = Pula’) — Pu(— a’ — 


and for a < 0, Fa:(a) = O. Thus for z < 0, lim,.. > Fx. (xz) = 0, and for 
s> ek 


kn kn kn 
lim = [Fre(z) — 1) = lim > [Fya(x™”) — 1) + lim 7 [—Fa(—2'” —)). 


n>e2 k=l n>e k=] n>o k=) 


° l/r l/r - ° ° e y 

Now assume that x’ and —2x’” are continuity points of N(x) and M(z) respec- 

tively. Note that the set of points x > 0 for which this is true is dense on the 
°,° . . . k \ r I/ry 

positive axis. For such zx we have lim,.. pF [Fas(z) — 1] = N(x") 
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M(—x'"). We note that the function N(2"") — M(—z2’’’) and the function 
kn 
> (Fu(z) — 1 
k=l 
. ° . . k , 1 
are both nondecreasing for xz > O and hence lim,.. Dt [Fas(z) — 1] 
N(x") — M(—zx"") at all continuity points, z > 0, of N(z"") —M(—z"’). 
Now since lim,.2 F(x) = F’(x) we see by Theorem 1 on page 116 of [1] that 


M(x) = Oand N(x) = N(x") — M(—zx"”). (Note that since f°, 2° dM(z) + 
Sox dN(x) < it follows that for r = 1, f°. 27 dM"(z) + foa° dN’ (x) < @. 
Now since lim,.. F(z) = H(zx), it follows by Theorem 2 on page 88 of [1] that 
lim,.« M’(x) = M*(z) and lim,.. N(x) = N*(zx) at continuity points of M*(z) 
and N*(xr). This shows that M*(rx) = 0 and 

(N(i+) — M(—1-),  z>1, 


N*(z) = lim [N(2"") — M(—2"")] = 
row N(i-) — M(—14+4), 0<2<!l 


This shows that N*(z) is constant for 0 < z < 1 and for z > 1 and hence (since 
M*(— x) = N*(+) = 0) we see that N(1+) = 0 and M(—1—) = 0. Thus 
we see V*(z) is either identically 0 or takes one jump at x = 1. (In fact if both 
M(x) is continuous at —1, N(z) is continuous at +1 then N*(z) = 0; otherwise 
N*(z) takes one jump). Now let o* be the nonnegative constant associated with 
H(z) by the formula (2.1). Then if ¢* = 0 and N*(z) takes one jump it is clear 
that H(x) is Poisson or H(x — m) is Poisson (m a constant). If c* = 0 and N*(z) 
= 0, H(r) is a unitary distribution. If ¢* = 0 and N*(z) = 0, H(z) is Gaussian; 
and if c* = 0 and N*(zx) takes one jump, then (cf. [1]) it follows that H(z) is 
the sum of two independent random variables one Gaussian and the other Pois- 
son. This proves the lemma. 

Proof of Theorem 1. Let s = ro and let yar = | tne |*. Then | raz |” = | ye |" 
Then for r/s = 1, under the conditions of Theorem 1 the conditions of Lemma | 
are satisfied with the system (z,x) replaced by (y,). This proves Theorem 1. 

Lemma 2. Jf lim,.. F,(z) = F(x), then for suttably chosen constants B,(r), 
F’.(x) converges to a distribution function F(x) if and only if? 


. £ ws 
lim limfep 7.4 [ x’ d(Fy.(z) — Frr,(—-2z—)] 
(3.1) «+0 noo k=) \ “0 


€ 2) 
= ( x d[F..(z) — Fua(-2-)1) > =g¢gi< x. 
/0 ) 

Proof. Suppose lim,.. F(z) = F(x) and that (3.1) holds. Then as in the proof 
Lemma 1, lim,., > Fl.(z) = 0 = M'(r) for x < O, and 


kn 
‘ omer vy, ile lir or 
lim 5 (FY xr) -— 1 z= N(x = M(—2x } = N (zr tor» > 0 
n+» k=) 
7 We use the notation limf:? to mean that the indicated condition is to hold for both 
lim inf and lim sup 
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(Here we consider N’(x) and M(x) only as functions just defined and not, at 
this point, as being associated with any distribution function.) We see that 
M'(—«) = N’(+~) = Oand that f°, 2°dM"(x) + fo2° dN"(x) < ©. Consider 
he of , *\ 
lim limfnt < | x dF%,(z) — ( | x iP (2)) 
«+0 no k=1 ( ) 

|jz|<e |zi<e 


kn 


lim lim{n? >: 


e70 n>o k=] | 


1 


[a dlFa(2”) — Fa(—2""—)] 
0 


€ 2) 
' (| x d[Fy(z") — Fas(—2'"—)) 
40 


) 
kn el/r 
lim limfnt xf [ x“ d[Fx.(x) — Fu(—x—)] 
0 


«70 n> k=l \¢ 
- (/ x d[F.,.(2) — Fu(—xr— 1) }=o, 
Jo 


using condition (3.1). (Note r is fixed here.) Now by choosing 


kn 
Br) = > / x dFi.(x) —C,+ 0(1 
om lzi<r 

where C,, is a constant and o(1) +0 as n— «x, we see by Theorem 1 on page 
116 of [1] that F(x) converges to a distribution F’(xr). (We note that M’(z), 
N’(x) and o? are associated with F’() through the formulas (2.1).) 

Now suppose that F),(x) — F’(r). Then again using the theorem of [1] referred 
to above we see that (3.2) holds and hence that (3.1) holds. 

THEOREM 2. Under the conditions of Lemma 2 a necessary and sufficient condi- 
tion for the distribution functions F"(x) to converge (r — ~ ) to a distribution func- 
tion H(x) for suitably chosen constants B,(r), is that® 


M(x) = 0 for x < —1, N(z) =0 for x>1, 
(3.3) 


limo = (o*)’. 
P20 
Furthermore H(x) is Gaussian if M(x) is continuous at —1 and N(x) is continuous 
at +1, H(x — m) is Poisson if o* 0 and either M(x) is discontinuous at —1 or 
N(x) is discontinuous at +1 where m is a constant, and H(x) is the sum of two inde- 
pendent random variables, one Gaussian and the other Potsson otherwise. 

Proof. Suppose lim,... F(x) = H(x). Then as in the proof of Lemma 1 we see 
that M(—1—) = 0 and N(1+) = O and hence M(x) = O for x < —1 and 


N(x) = O for x > 1. Now by Theorem 2 on page 88 of [1] we have 


{ 0 € ) 
lim limint \/ ue dM" (u) oe o os / ue dN’(u)> = (g*)°, 
Le ) 


e>0 reo “0 


3 Same notation as in the proofs of Lemmas 1 and 2. 
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Now 
{ -0 ne re 
¢f wadM'(u) +] ano} =| w@dNw") — M(-w")| 
\o—e “0 “0 
ei/r 
=| aNnq) - M(-w) 
0 
a 0 \ 
s «| yf dN(y) + [ vf dM(y) | forr > land0O <e <1. 
0 1 
Thus we see that lim inf,.. 0 = (o*)? = lim sup,.« -. 


Now suppose (3.3) holds. Then as in the proof of Lemma 1 we see 


, 


: sas N(i+) —M(-1-) forz > 1 

lim ] = N* =< : 

=e! wa) —-M(-1+) for0<z<1 
and lim,.. M'(r) = 0 = M*(z). (Here we consider M* and N* as functions just 


defined and not at this point as being associated with H(x).) Now from (3.3) it 
follows that N*(+ ©) = M*(—o) = Oand f°, 2° dM*(z) + S$ 2° dN* (x) < &. 
Also since 


470 ren 


f 0 € 7 
lim limint 4 | uw dM"(u) + | uv dN"(u) = 0 
\ Jig 0 
(from the first part of this proof), it follows from (3.3) that 


0 
lim ims { [ 


470 ree v—¢ 


u’ dM*(u) + of + [ uv an" wh = (¢*)’. 
0 


Now by Theorem 1 on page 116 of [1] we see that y,(r) = 2 ae Sisicet dF (x) — 
B,(r) + o(1), where y,(r) is associated with F’(x) through the formulas (2.1). 
Thus by the proper choice of B,(r), y-(r) converges (r — «) to some constant 
y«(r), (7 fixed). But using Theorem 2 on page 88 of [1], we see that lim,... F’(r) = 
H(x), where H(zx) is the infinitely divisible distribution determined by M*, N*, 
ye(r) and (c*)’ given above. It remains to show the form for H(x), but this 
follows as in the proof of Lemma 1. 


4. Characterization of the Poisson distribution. In this section we give con- 
ditions which will insure that the distribution functions F(z) will converge to 
the Poisson distribution. We use the same notation as in the previous sections. 
In particular M(x) and N(z) are associated with the distribution function F(z), 
the limiting distribution of F,(x). 

TuHeoreM 3. If F,(x) converges to F(x), M(x) = 0 forz < —1, N(x) = 0 for 
x > 1, and 


kn 
(4.1) [ |x|*dFy(x) is bounded in n for some s < 2r, 


k=1 “|z|<e 
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then for suitably chosen constants B,(r), F(x) converges (n — «© ) to a distribution 
function F’(x) and F’(x) converges (r — « ) to the Poisson distribution. (No matter 
what the choice of B,(r), if F,(x) — F’(x) and F'(x) — H(z), then there exists a 
constant m such that H(x — m) is Poisson.) 

We postpone the proof of Theorem 3 as well as that of the next three theorems. 
In the rest of the paper it will be convenient to assume r > 1. 


THEOREM 4. Condition (4.1) of Theorem 3 may be replaced by 
(4.2) The random variables (x,.) are symmetric about the origin. 

THEOREM 5. Let Sy, = 2a + + Int, (i.e., let A, = 0) and suppose F(x) > 
F(x). Let N(x) = 0 for x > 1. Then if the (xx) are positive random variables the 
conclusion of Theorem 3 holds. 

THEOREM 6. Let A, = 0, F,(x) — F(x), M(x) = O fora < —1, and N(x) = 0 
for x > 1. Then if the (xxx) are identically distributed within each row the con- 
clusion of Theorem 3 holds. 

Proof of Theorem 3. We first show that condition (4.1 


implies condition (3.1) of 
‘ . 2 , 
Lemma 2 with o; = 0. We have 


{ pe 


-€ 2\ 
m4 a” d[F(2)— Fa.(—z—)] — (| x d[Fyu(x) - Faa(-2—)1) > 


J 
“0 0 


\ 


kn ( * ss 
D4 [ x” d[PFa(t) — Fu(—z—)] 
k=] | Jo 


kn € 
r Z {| a |" dF yx (x) — Fy.(—2- )] 
k=l 0 


kn [ € -0 \ 


= e : 4 [ x ° dF (2) - | {x \* dF x(x —) > , 
k=1 | 0 7 | 


/ 


and since 2r — s > O we see by (4.1) that (3.1) holds with o. = 0. Thus from 
Lemma 2, F%,(x) — F(x). Also since lim,..o2 = 0 = (o*)*, it follows from 
Theorem 2 that F’(x) — H(x) and that H(x — m) is a Poisson distribution. 
(This includes the possibility that H(z) may be a degenerate Gaussian distri- 
bution.) We note that B,(r) could be chosen so as to make m = 0. This proves 
the theorem. 


Proof of Theorem 4. We only need to show that (4.2) implies (4.1). Let an. = 


| x dF,,(x) for some r > 0. By Theorem 2 on page 111 of [1] we have 
Isles 


Kn © x 
a, [ - -. dF yy (x — Qnk) 
x 


+ 7 


k=l J 1 


is bounded. But since the random variables are symmetric it follows that a,, 
and hence 
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ken kn x 
» | dFy(x) S$ (l1+é) / Dap UP al”) 
k=] k=] 1 T 
Izi<e |lz\<e 
sate | ~aFale) 
k=] Jw ] + 


is bounded. Thus (4.1) holds with s = 2, i.e., forr > 1. 

Proof of Theorem 5. Since the z,, are positive it follows from Theorem 1 on 
page 116 of [1] that M(x) = O for x < 0, and that > pee Sicice z dFy(z) = 
pF fo x dF,,(x) converges to a constant y(r) (note A, = 0). Thus 


kn “ “2 2 
lim > || x @Fu(2) | = >> If x dP axe | 
now kel “|zl<e n>o k=) 
ne 7 kn -« 
lim | max (| x dF x(2)) || tim Zz | x dF x) | = Q, 
now Llscksk, \-0 n~>w k=l “0 


lA 


since limy,.«2 MAXi<icim fox dF u(x) = O (infinitesimalness). Now again from 
Theorem 1 on page 116 of [1] we have 


. . sup kn 2 > 
lim,+o limint rm y | x dF,,(z) — (| 


\ lzl<e “lzi<e 


x dF a(2)) -=@ 


so that 


lim,.o limint nex rh | z adF,,(z) =o < @. 


|lzi<e 
Thus >» > Sizice x” AF ,x(x) is bounded in n so that (4.1) holds with s = 2. This 
proves Theorem 5. 
Proof of Theorem 6. Since A,, = 0 we again have 
> | cdFulz) =ke | 9 x dF u(x) > y(r). 
kel “|z/<r “lzl<r 
Also 


lim >> (| x aF (2) = lim k, (| x aF 3(2)) 
n>o k=l “|zl<e n»x “|z]|<e 


= lim k.. / x dF,,,(x) -lim | x dF ,,,(x) 
n->2 z|<e now Y|rl<e 
= ¥ (r) -0 = 0, 


since the random variables (r,;) are infinitesimal. From this point the proof is 
identical to that of Theorem 5. 

The next theorem shows the existence of a double sequence of random variables 
(\r,.|") such that the sans functions of the row sums (minus a constant 
converge to the Poisson distribution. 
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THEOREM 7. Under the conditions of any one of the Theorems 3 through 6 there 
exists a sequence of numbers r, — © such that the distribution functions of the sums 


In|” +--+ + jrne,|" — Balrn), (Ba(ra) suitably chosen constants) converge to 
the Poisson distribution." 

Proof. We have lim,.. F(x) = F"(x) and lim,.. F’(x) = H(x), where H(z) is 
a Poisson distribution. (In particular the first limit relation holds for r = 2, 
3,--:.) Let {&}, k = 1, 2,--- , be a countable dense set on the real line such 
that F',(&) — F’(&) for r = 2, 3, --- and F’(&) — H(&) for all k. Let {e,} bea 
positive decreasing sequence of real numbers such that e, — 0 asn — ~. Let 
{n-} be an increasing subsequence of the positive integers such that n 2 n, 
implies that |FA(&) — F’(&)| < « fork = 1, 2,---,r (r fixed). Consider the 
sequence of distribution functions S: Fi(r), F3(x), --- , F,—, (2), F%, (2), tee, 
F%,_,(2), Fa, (2), +++ F4,_,(z), «++ . We claim this sequence converges to H(z) for 
x = & fork = 1, 2,---. Consider & . Let ¢« > 0 be given. Let 7, (ro > k) be 
such that r 2 ro implies e, < «/2. Let r; 2 ro be such that r = 7, implies 
|F"(&) — H(&)| < ¢/2. Then forn > N(&) = n,, consider 


Ful&) — H(&)| S |\Fa(&) — F'(&)| + \F'(&) — H(&)). 


Since we are considering only elements of the sequence S we have n > n,, im- 
plies r => r; = ro > k. Therefore |Fn(&) — F’(&)| < & < ¢«/2and |F"(&) — H(&)| 
< ¢/2. Thus |F,(&) — H(&)| < ¢ and we see that the sequence S converges to 
H(x)forx = &,k = 1,2,--- . But since {&} is dense, the sequence S converges 
to H(x) at every continuity point of H(x). Now if we let r, = 2forn = 1,---, 
m3 — landr, = mforn = Mm,-++* , NMma1 —1 (m > 2), we see that the distri- 
bution function of |rn|" + --- + |rnx,|" — Ba(ra) is F(x), which is the nth 
element of the sequence S. This proves the theorem. 
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THE STAIRCASE DESIGN: THEORY 


By FRANKLIN A. GRAYBILL AND WriiuiAM E. Pruitt 
Oklahoma State University 


0. Summary and Introduction. One of the most popular designs in experi- 
mental work is the randomized block. These designs can be put into three 
broad classes viz. complete block design, balanced incomplete block design, 
and the partially balanced incomplete block design. These designs are all spe- 
cial cases of the general two way classification with unequal numbers in the 
subclasses, but since the analysis of this general classification is quite complex, 
these special cases have evolved which are adequate to fit most needs and the 
analysis of these special designs is relatively easy. [1], [2], [6], [8). 

However, most of the block designs considered to date have one feature in 
common—they require each block to contain an equal number of experimental 
units. The exceptions are given in [9], [10], where designs are considered in 
which the number of experimental units in blocks differ by one. The purpose 
of this paper is to extend the randomized block design to include the case where 
all blocks do not contain the same number of experimental units. We have 
called this the staircase design. 

Suppose an experimenter, wishing to run an experiment using N treatments, 
decides to use a randomized block design, but after arranging his material 
into homogeneous groups he finds that he has blocks available which have 
varying number of experimental units. The experimenter has various courses 
open to him: (1) If enough blocks are available with N or more experimental 
units he can discard the extra units in these blocks, discard all the blocks which 
have less than N units, and use a randomized complete block design; (2) He 
can discard units in the blocks until he has enough units and blocks 
for a balanced incomplete block or a partially balanced incomplete block de- 
sign; (3) He can use all the experimental units and use the staircase design 
proposed in this paper. 

For example, if an experimenter has N treatments with which he wishes to 
experiment using a randomized block design, and if he has blocks of unequal 
size, then he must rank his N treatments in the order of their importance, i.e., 
T,, T:,-+++, Tw, where he considers 7; the most important and 7'y the least 
important. Now suppose he has at his disposal b; blocks which each contain N 
experimental units. Then all N treatments are randomized in each of the b, 
blocks. Suppose further that he has by blocks which each contain N, experi- 
mental units (VN, < N). Then the first N, treatments are arranged at random 
in each of the b2 blocks. This process is continued until all the blocks are used. 

A particular example where this would be useful is an experiment involving 
animals as experimental units where a block consists of litter mates. Let us 
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suppose that we have two litters of size seven, three of size five, and one of 
size four. Using the staircase design we can include seven treatments and still 
have the four we are most interested in replicated six times. 


1. Notation. Consider a two-way classification model 
(1.1) Yii=puptBit aj t+ ej; ¢=1,2,--- ,¢e53;j = 1,2,--: ,N. 


where u, 6;, a; are constants and the e;; are normal independent variables 
with means zero and variances o°. Also the j’s will be ordered in such a way 
that c; = cj for 7 < j’. The purpose of this paper is, 

1. to derive the least squares method for testing the hypothesis a; = a; = 

- = ay under the model given above and to give the power function of 
this test. 
. to derive the best, linear, unbiased estimates for a; — a; , and the vari- 
ances of these estimates. 

First we will separate the j’s into subsets such that j and j’ will be in the 
same subset if and only if c; = cj . Each of these subsets will be called a sfep. 
We will designate the number of steps as k. 

Let 


= M' for j=1,2,---,M; 
ec; = M* for j = Ni +1,Ni4+2,°-°, Nit Ne, 


c;= M* for j= Mi+Net---+ Mati Mt+Nat+-::: 
+ Ni- + 2, s+ Nit Net cies + Ni, 
where 


& 
=N; > M'N, = N* 


t=] 


ai 0, mM = 0, 


for += 1,2,---,M',j = N°'+1,N°' 42, 
for i= 1,2,---, Mj = 1,2,---,N’ 
for j = N*'+1,N*'4+2,---N’, 
for j = 1,2,---,N’. 
Fig. 1 will serve to illustrate some of the notation. It will be noticed that c, 
is the number of blocks in which treatment j appears; M‘ is the number of 
blocks in the ‘th step; N; is the number of treatments in the fth step. Also Yi; 


is the observation of the jth treatment which appears in the tth block of the 
sth step. It may be helpful to note further that Y;} is a subset of Yi;; Yij isa 
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subset of the union of Yj; and Yi; , Y;3 is a subset of the union of Yi; and Yi, 
and Yi; , ete. 

A subscript replaced by a dot indicates the mean of the elements when 
summed over the range of the replaced subscript, eg. 


M* N2 


D Ly Yi 


_, tal j=l 


“MBN? * 


Since superscripts are being used in abundance, a Y, M, N, ora that is raised 
to a power will always be enclosed in the appropriate brackets. 

If, in a summation, the lower limit of summation should exceed the upper 
limit of summation, the sum will be zero. 

The notation used in Section 3 is that used by Kempthorne [4], pages 79- 
82, with the following exceptions. To be consistent with the notation given 
above, the normal equations are divided by a constant to give them in terms 
of means instead of totals. Q} will refer to only the Q; where j = N*" + 1, 
N°" + 2,---,N’. 


2. The test function and its distributional properties. The purpose of this sec- 
tion is to give a test of the hypothesis a; = a; = --- = ay and to derive the 


N, 


BLOCKS 


TREATMENTS 


Fie. 1 
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distributional properties of the test function. The proof that this test is the 
same as that given by the method of least squares will be given in the next 
section. 
Consider the following quadratic forms: 
mM? nt 
q=L db (Y%-¥i.-¥5 + YD’, 


t=] jaent¢—141 


2 NN py ! 

d= at 2b (vd — vit — vi + vty, 
d , i=l 

” Né 

qg=M > (Yi — ¥%), 


j=mNt-lil 


MN Nias pepe rt+ly\2 
= re (WS — YY, 
mM¢ 

ie ae (Ney! 4 N, Yt)? 


N! ; 
jim—M ttl} 
Mt nt 


qt Zz a (Yi,)’, 


im] jent—lit 


We will prove the following: 
THEOREM I. /f 


(21) v= _ Lin at > ft gq: (M' — 1)(N — 1) — Dotee (M* — M')(NY) 
, Dia qe + tal 1 Qt N-1 


then v is distributed as Fo, , where Fy.qx represents the non-central F with de- 
grees of freedom p and g and non-centrality d {7}, also 


k 
p=N-1,q=(M-1)(N-1)- x (M' — M') N,, 


k A t nt k—1 A t+1 . ia “4 F 
=> [26 2, @i — o*] +E [eel = at"] 


t=] 20° j=Nt-1l41 t 


(2.2) 


and X = O72f and only tf a, = ag = +++ = ay. 
Proor. It is clear that 


(2.3) Yet Uthat odtLd=Le. 


t=1 


Now it is easily shown that the rank of qi is (M‘' — 1)(N; — 1), the rank of 
qi is (M‘** — 1), the rank of qi is (NV, — 1), the rank of qi is 1, and the rank 
q: is (M‘ — M‘*"). Adding we see that 


> (rt =. - 1) + 3 (ut — 1) + DW - 1) 


k-—1 k 
+ (k—1) + > (M — M*) + (at — M**) = Do M'N, 
t=1 


t=l 
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Thus we have the fact that the sum of the ranks of the quadratic forms on the 
left of (2.3) above is equal to the number of squared observations on the right. 
We may now invoke a theorem proved by Madow [5] showing the quadratic 
forms to be independent, and verifying the following distributions. (Z will be 
used to denote mathematical expectation, and x, will represent a non-central 
chi square distribution with degrees of freedom p and non-centrality X). 
1. qi/o’ is distributed as x,,, where p = (M‘ — 1)(N; — 1), \ = 0, since 
E(Yi; — Yi. - Y', + Y‘) = 0. 
2. qi/o’ is distributed as Xen; where p = (M‘*' — 1), \ = 0, since 
E(Y¢ — Yi" — Y'+ Y™") =0. 
3. q:/o is distributed as hs where p = (N,; — 1), 
M' nt 
A= — YD (ai — a’)’, 
20? jen t—141 
since 
E(Y', — Y‘')= a; —a. 
4, 2-. . ° ‘2 
4. q:/o° is distributed as xp, where p = 1, 


= M*'N' Ness (4) ‘t a a’*)? 


2 g2N t+! ’ 
since 
r¢ur't yt+1 't t+1 
E(Y_' — YS") =a,'-—a.”. 


Therefore it follows that 


apres ya 


t=] 
is distributed as xn , where 
k 
p = (M'-1)(N - 1) — 2 (MM — MN), 
t=—_2 


and \ = 0. Also we have 


pr +Za| 


. . . ‘2 
is distributed as x,,,, where 


«Fe - 04+0< oe +4, 


t=! 


k t nt k—1 t+lar7t 
‘en [4 ; = a‘)? | + een” = of" 


tool | 20° jaNe—141 =i 2eeN'H 
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TABLE 3.1 


Due to df Sum of Squares 


Blocks ignoring treatments M} DNi.(¥;.)? 


Treatments eliminating blocks N - D 39a; 


Error By subtraction 


Total 


Hence, we have v as defined in (2.1) above is distributed as F.¢ , where p, 
q, and X are as defined in (2.2) above. 

Now it is clear that \ = 0 if and only if a; = a, = --: ay since \ isa 
sum of non-negative terms and can be zero if and only if each term of the sum 
is zero. Therefore to test the hypothesis a; = az = +--+ = ay we use v as Snede- 
cor’s F with p degrees of freedom and q degrees of freedom, where p and gq are 
as defined in (2.2). 


3. The analysis of variance. In Section 2 it was shown that the test function 
v could be used to test the hypothesis a; = ag = --- = ay. We will now show 
that v can be derived by the method of least squares. The model can be con- 
sidered as a two-way classification model with unequal numbers in the sub- 
classes. In this case the conventional analysis is given in Table 3.1 [4]. If we now 
denote the mean square for treatments eliminating blocks by T and the mean 
square for error by £, then W = T177/E is the least squares test function used 
to test the hypothesis a; = a. = --- = ay. (N;. is the number of treatments 
in the 7th block). We will now show that the function v in Section 2 is the test 
criterion given by least squares. 

In the above table, the Total SS minus the Block ignoring treatments SS is 
equal to 


k k 
D4 <a 2 a: 


t=1 


It remains only to show that 


N 


k k—1l 
he Q5a; = a qe + 2, q: 


a t t 


=] 


and the rest follows by subtraction (a; is the least squares estimate of a;) 
We have the following system of normal equations: 


i % p= M’ +1, +2,---,M 
M* + 1, 


M + 1, 
1,2 
1.2 
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(3.2.2) Yo =£+6 +4, j=N'+1,N'4+2,---,N’ 


ll 


G2k-1) YF'=a+R+ay", g= N*+1,N°*%4+2,---, Ne 
wae Yi =a+B+ai, y=N**4+1,N*74+2,---,M 


where 


B -_ ye By 





Mt 
Imposing the linear restriction &@, = 0, we find from (3.1.k) that 
a+ B= Yi, i= 1,2,---,M*. 
Substituting this into (3.2.k) we have 
, P*r* 4 mr rk—I sie P 
= ¥*, =~ ———, not 4-1, N** +2 eee 
Now since 
nei N 
i ye = » ay 
j=l jon*—'41 


under the restriction, a, = 0, we may now substitute back and solve (3.1.4 — 1) 
obtaining 
Ny, Hw ( 


a+—B= Yi. + «i 


Nel oN 
' Ny sera kl k « k-1 
=“ %+2 02 - v. ), i= M*+1,M*+2,+--,M 


gon yo Mt ya +, ¥)- Dinu: Yi, 
i es ae 





} ee ] * \ Atel 
[* ae M* me (y* — yy’) 
< yin 7 n*-? y*" a Nun * i N, cy - yy, 


NF-1 N 
j = N**+1,N** +2,---,N™. 
Finishing the solution in this manner, we obtain 
rpP—l yr/p—-l » ar ik r 
A a +A pY" - >. Ne Pa fr. 
N é t=p+l N t 


j= N”*+1,N"" +2, ---,N’,p =1,2,°°,k 


ap = Yi - 
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which may be written as 


ap = Yi, -—Y? — 


(3.3) 


Now 


_ M 
~ Mp 


rp—l 
N” 
N? 


Therefore 


k NP 


> Q; a; = Zz 


p=l j=NnNP—141 


Collecting coefficients of 


M**™ Nya: (N’)® 
~ ( Net )? 


FRANKLIN 


M’ N; mM (Nva1)* 
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eS 
’ 


ee tt 19 ~ 2 


t=p+1 < t 


= N74 1, N?*42,.--, 


ty! 


N” ,p 


MP 
a Ys. ’ 


i=l 


= My? — 


= N?*+1,N"'+2,--- 


> 


M , 
imwkyil i. 


M? 


NOT YS +N 


ye 
N* _—s 


«a ¥%) 


So MN, 


rt r/t—1 
sas = 1,2,---,k 
24, PN (Y. - Y. | » PP 


(Y% — rn | 


(vi 


NP 


2 


j=NP-141 


Q? a; 


> [ar 


p=l 

k , nN?! 

M?N aaa 

+ » 1 * Le 

+ ae as Y’ “|. | 
«-rey] 


)’, we have 


—Y?) 


N?* (ytra _ ye) 
MPN' Ne.” P 


t=p+l1 


2? 


t=p+l N 


Tr -~ Fy 
oi M' Ny M'*"(Nv41)" 


e M} (N+)? 


Mr (N**)2 


Combining the last r terms this becomes 


M™ Ne(N")’ 


Nt M*(N +1). _ mt Nr+1 N’(N’ + N41) 





wy 


(N r+1)2 (Nrt)? 
M"" Ney N’ 
wT N r+1 
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Collecting coefficients of (Y’’ — Y’*)(Y"" — Y‘*), r < 8, we have 


_M™ Nia N’ Nowy MON, M™ Nea Now yg MENG M™ Noa News 
NN MNANS — 


_ M™ Nea NM Ne) MN, News Mi News 
Nrt M+ N*t ' . Nv Mr’ Neti nag 


M' Ni Nv M™ Negi 


+ Wet MN 


Combining all but the first term of each part gives 


_M™ New N’ Ness. NUM Neat Neg _ NegiN’ MAN 4s 
Nrti Ne 


Nett New Ne Net 


N’ Na M™ Nes 
+ ~ Net New 7 


Now since these two general terms are the only possible ones involved in the 
second summation of the expression for 


= 
> Q; a; ’ 
j=l 


we have 


= Q; 4; = >. Ba > (Y5- r2 | 


j=l p=1 j=NP—141 


— M"*'Ny41 N” 


4 


90 y?*!)? 


since N° = 0. 
Now by subtraction the Error 8.S. must be 

‘ k—1 

Dat LG. 

t=! t=1 
Also since the degrees of freedom for error and treatments eliminating blocks 
in Table 3.1 are the same as q and p of (2.2), then we have W = v. Thus we 
have shown that the test function given in section 2 is that given by the method 
of least squares. 


4. Means and standard errors. We will now derive the best, linear, unbiased 
estimates of a, — a; and the standard errors of these estimates. 
THeoreM II. 


Pr k 7 
nae = i - 2 at ~ 94, 


mg @ Y¥,- Yr --|; : i 
a : : N 1 
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g = N’'+1,N""*+2,--- 
p = 1,2,-:: 


ts the best, linear, unbiased estimate of a, — a. , and therefore & — &, ts the best, 
linear, unbiased estimate of a — au . 

Proor. Since & was found by the method of least squares (3.3) using the 
linear restriction &. = 0, this result is a consequence of the Markoff Theorem 
[3]. 

Turorem III. The variance of the estimate of a? — a® is 20°/M’ if s ¥ u, 
and s,u = N®" + 1,N”'+2,---,N” for p = 1, 2,--:, k. The variance 
of the estimate of a? — ai, 1s 


Po 3 
| 4 — ee 


M? N? wopu1 M'*N' NO 


fors = N®'+1,N"'+2,---,N?,u = N”'+4+1,N' + 2, 
12,---,kK-lr=pt+l1,p+2,-:-: 
PROOF 


and s, u 


MP e MP e 
° as - j=l - i 
Var (a? — a?) oS E | Seat Patent 


M? MP 
2 2 9,2 
o oC ao 
“ort ip ~ e’ 


NOY + NYE RM 


rt rht—1) o yr r'r—] 
Ni Ne =F. (Yu. + ¥.™) 


and by straightforward application of expected values we arrive at the result. 
From the theory of least squares it follows that the error mean square 


k—1 


k 
; a+ Dai 

(where q is defined in 2.2) is an unbiased estimate of o° and is independent of 
a; . Therefore, these quantities may be used to set confidence limits about the 
difference bet).een treatment means or any linear contrast of treatment means. 
The.efore, by using equation (2.1), the analysis of variance for the Staircase 
Design is easily computed. By using the formulas in Theorem II and Theorem 
III, the means and standard errors can be easily computed even if the number 
of steps is large. In another paper we will give detailed computing instructions 
with a numerical example of the Staircase Design. 
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We have assumed throughout that the variance of e,;; is a constant inde- 
pendent of 7 and 7. It may be that if the variance of e,; is in some way depend- 
ent on the number of experimental units in a block, and if the number of units 
differ widely, then this may somewhat invalidate the exact distributions of the 
test function. The variances will probably have to be quite different before 
the distributions are disturbed appreciably. On the other hand it may be that 
an experimenter divides his material into homogeneous groups with constant 
variances and finds he ends up with different number of plots in a block. This 
would suggest using the staircase design. 

Also this design may be useful in case an experimenter desires to conduct an 
experiment on two sets of treatments and is satisfied with different precisions 
on the two sets. 
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A NOTE ON THE FUNDAMENTAL IDENTITY OF 
SEQUENTIAL ANALYSIS 


By R. R. BAHApuR 
University of Chicago! 


1. Introduction. This note points out that the fundamental identity of se- 
quential analysis [1] can be regarded as a special case of a formula for the prob- 
ability that the sampling terminates at some finite stage. This viewpoint, ex- 
plored in Sections 2 and 3, provides proofs of the identity, and of its differ- 
entiability under the expectation sign, that seem more intuitive than the proofs 
in the literature ({1], [2], [3], [5], [6]). 

The formula also has application to the well-known problem (cf., e.g., [7], 
[8]) of evaluating the probability of eventual termination of a random walk 
on the real line, in the case when there is one fixed barrier and a drift away 
from the barrier. Some upper and lower bounds on the probability in question 
are obtained in Section 4. 

In concluding this introduction, the writer wishes to thank his colleague 
L. J. Savage for discussions and suggestions that have made a substantial con- 
tribution to this work. 


2. A Formula for P(n < o). Let x be a real valued random variable with 
distribution function F. It is assumed that the moment generating function 


a 
(1) o(t) = [ e” dF 
. x 
exists for every real ¢ in some neighbourhood of ¢ = 0. Throughout this note, 
t is restricted to real values for which ¢ exists. 
Let x.) = (%1, 22, °°: , ad inf) denote a sequence of independent and identi- 


cally distributed observations on x. Consider a fixed sequential sampling pro- 
cedure, that is, a set of rules for observing the components 2 , 22, +--+ , of 2) 
one by one, such that at each stage the decision whether experimentation is to 
continue is a (possibly randomised) function of the observed values in hand 
at that stage. (Cf., e.g., [6], [9]). Let n denote the total number of components 
Lm Observed in a given instance. It is assumed that the sampling procedure is 
closed under F, that is, 


(2) P(n < ~|F) = 1. 
The procedure is otherwise arbitrary. 

Write s = x, + --- + 2, and 
(3) v(t, n, 8) = [ed] "e” 


ifn < ~», and write y = 1 (say) ifn = ~. 
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THEOREM 1, For every t, 
(4) E(yit, n, s)| F) = P(n < ~ |G,), 
where 
(5) dG, = [p(t)|'e* aF. 


For each ¢, G, defined by (5) is clearly a probability distribution function, so 
that {G,} is an exponential family of alternative distributions of z, with F a 
member of this family. Such families of distributions have been studied in var- 
ious statistical contexts, including that of sequential analysis (ef., e.g., [1], 
[10], {11]). It may be added, however, that the notion of alternative distribu- 
tions is not essential to this paper, and the introduction here of the family {G;,} 
could be regarded as a device in the study of the given sampling rule when F 
obtains. This device is, of course, a familiar one in probability theory (cf., 
e.g., [3], [12], [13], [14]). 

To establish Theorem 1, for each m = 1, 2, --- let R°” denote the nonse- 
quential sample space of exactly m observations, that is, of points 


(21, °°, 2m) = Tm) 


say. For each m, let am(zm)) be the conditional probability of the event n = m 
given 2;,.) . The sequence a , a2, --- of functions on R”, R®, --- characterizes 
the given sampling procedure, and is, of course, independent of the distribution 
of x (ef., e.g., [9]). For each m, let F°” denote the distribution function of z(m) 
when F obtains, that is, F°”(z,;,---, tm) = ti F(z;,). 

Let v denote the total outcome of the sequential experiment, that is, v = 
(m1 ,°++,2) ifn < © andv = x, ifn = ~. We note that if h is a real valued 
function of v such that E({h||F) < , and (2) holds, then 


. 


(6) E(h|F) = | am-hn OF * 


m=1 / R(™) : 
where h;, h2,--- is the (essentially unique) sequence of functions on R, 
R®, --- such that h, = h when n = m, (cf., e.g., [9]). The right side of (6) is 
an absolutely convergent series; in fact, h is integrable if and only if 


Dinfan:| An | dF” < o. 


In accordance with the above notation, for each m = 1, 2, --- let s,, denote 
the nonsequential random variable z,; + --- + 2. Then it is easy to establish 
(4) thus: 


P(n < ~|G,) > P(n = m|G) 


m=) 


=. / am dGi™ 
m=] 


R(™) 


Zz / On'd -e%™ dF™ by 


m=) 


R(™) 


E(y|F) by (2), (3), and (6). 
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3. Wald’s identity. It follows from Theorem 1 that Wald’s identity, namely 


(8) Ely(t, n, s)| F] = 1, 
holds for a given value of ¢ if and only if 
(9) P(n < © |G, = 1 


for the same / value. 

It follows easily from the preceding remark that a sufficient condition for the 
validity of (8) for all ¢ is that there exist a finite k such that P(n < k| F) = 1. 
The same remark, together with the strong law of large numbers and the law 
of the iterated logarithm (cf. e.g., [15]), also yields the following sufficient con- 
dition for the same conclusion (assuming that F is a non-degenerate distribu- 
tion): there exists anh, —*x < h < «, and two sequences {a,} and {b,} such 
that 

(i) Qn = mh + o(n/m log log m), bm = mh + 0(+/m log log m) asm — « 
and 

(ii) for each a) = (%1, %2,°-- ), eithern < © or dn < 8m < b» for all 
sufficiently large m. This condition seems weaker than other structural con- 
ditions of the same type in the literature for the validity of (8) for all ¢. 

It may be noted here that the first paragraph of this section also suggests 
examples where Wald’s identity fails to hold for all ¢. This was pointed out to 
the writer by L. J. Savage. The discussion in Section 4 concerns a general ex- 
ample of this sort. 

Next, we shall describe an alternative sufficient condition for the validity of 
(9) (and thereby of (8)) in an assigned neighbourhood of ¢ = 0, given (2) and 
(5). This condition does not depend on the detailed structure of the sampling 
rule; it is, essentially, that under F the joint moment generating function of n 
and s exist in a sufficiently large neighbourhood of the origin. As it happens, 
the condition also assures the validity of differentiation under the expectation 
sign in (8), that is, D*y(t, n, s) is integrable and 


(10) E(D‘y|F) =0 for k = 1,2,--:- 


and each ¢ in the neighbourhood, where D* = d‘/ dt‘. 
Let J be an open interval including ¢ = 0 such that ¢(¢) exists for each ¢ in J. 
THEOREM 2. Suppose that corresponding to each t in I there exists a 


(11) z> —log, d(t) 
such that 
(12) E(e"**" | F) < @. 


Then (8), (9), and (10) hold for t in I. 

The proof of Theorem 2 will be indicated later, but first some remarks by 
way of discussion of its hypothesis. 

Remark 1. Let C denote the set of all points (¢, z) in the plane such that 
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(12) holds. Then (a) C is a convex set, (b) (¢, z) e C implies (t, 2) e C for all 
2 S z, since n is non-negative, and (c) each point on the graph of the function 
z = —log¢(t) is in C, by (3) and (4). It follows, in particular, that (d) the 
hypothesis of Theorem 2 is that the graph of —log ¢(t) lie in the interior of C, 
at least when ¢ is restricted to J. It should be noted also (e) that z = —log¢ is 
a concave function of ¢, possessing derivatives of all orders, with z(0) = 0, 
and z’(0) = —E(z| F), by (1). The facts (a), (b), (c), (d), and (e) are useful 
in the proof of Theorem 2, and also in the deduction of special sufficient condi- 
tions for the validity of (9) and (10). (Cf. remarks 3 and 4 below). 

ReMarkK 2. In the statement of Theorem 2, as also in remark 1(d) above, 
the hypothesis is stated in terms of the given distribution F. The hypothesis 
can also be stated in terms of the associated distributions G,, as follows: for 
each / in J, the conditional moment generating function of n givenn < © exists 
in some neighbourhood of zero when G, obtains, that is, E(e"|G,,n < ©) = 
ne ™P(n = m|G.,n < ©) < & for someé > 0. This alternative formula- 
tion follows readily from (5), (6), (11) and (12). 


REMARK 3. Suppose that ¢ exists for all ¢. If 
(13 E(e"|\F)< « 
for some z > sup {— logd¢(t): -x <t < ~}, and if 


(14 E(e"|F) < x 


for all ¢, then (8), (9), and (10) hold for all ¢. A stronger sufficient condition 


for the same conclusion is that (13) hold for all z. 

{eEMARK 4. If (13) holds for some z > 0, then n and s possess moments of 
all orders, and there exists a neighbourhood of ¢ = 0 in which (8), (9) and (10) 
hold. A stronger sufficient condition for the same conclusion is that E(z| F) = 
0 and (14) holds for some ¢ of the same sign as E(x | F). These conditions are 
of interest since the validity of (10) in a neighbourhood of ¢ = 0 is sufficient 
for most applications (ef. [2]) of differentiation. 

Remark 5. A theorem of Albert [3], [4] states that if @ exists for all ¢, if 


P(x > 0|F)>0 


and P(x < 0|F) > 0, and if the sampling procedure is a random walk based 
on the cumulative sums s, (with fixed barriers a and b, a < 0 < b), then (8) 
and (10) hold for all ¢. It can be shown (cf. Lemma 2 of [3] and Remark 3 above) 
that in this case the hypothesis of Theorem 2 is satisfied by J = (—”, ~), so 
that Albert’s theorem is a special case of Theorem 2. 

Remark 6. In his applications of martingale theory to sequential analysis, 
Doob [6] derives from Theorem 2.2 of [6] a sufficient condition for the validity 
of (8) for a given ¢. It can be shown that if this condition holds for each ¢ in J 
then the hypothesis of Theorem 2 is satisfied. A fuller discussion of the relation 
between Doob’s Theorem 2.2 in its application to the present case and Theorems 
1 and 2 of this note would be worthwhile, but cannot be undertaken here. 
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Remark 7. It is easy to see that P(n < k| F) = 1 for some finite k implies 
the hypothesis of Theorem 2. L. J. Savage has constructed examples showing 
that the other condition stated in the second paragraph of this section does 
not imply the hypothesis of Theorem 2; in fact, (10) fails fork = 2 in these 
examples. 

We turn now to the proof of Theorem 2. The first step is to show that (9) 
holds in a sufficiently small neighbourhood of zero, that is, 


(15) > pat) = ] 


for all ¢ in the neighbourhood, where p,,(¢) is an abbreviation of P(n = m |G), 
so that 


(16) Pm(t) = [ com: {(t) | =e!" dF” 
R(™) 


by (5). Write Bn = 1 — DoT a, and p(t) = $(2t)/¢°(t). Then, for any ¢ in J 
and any m, 


ll 


P(n > m|G:,) / Bm dG,” since Bm = P(n > m| zm) 


R(™) 


¢” / Bre" dF™ by (5) 

R(™) 

f \1/2 f ; i 
on} f phar™\ .) f em ar™| 
(17) \R¢™) J rim) 


IIA 


, ( \1/2 
si J } / 3, adF™ \ 
lr a 


2 / 2. 


R(™) 


IIA 


\1/2 
! since 0S 6, < 1 
= V p(t)™- P(n > m|\ F). 
It follows from (11) and (12) with ¢ = 0 that, for some A > 1, 
\"P(n > m| F) 30 


as m— . Hence, by (17), P(n > m|G,) ~ 0 asm-— © for each ¢ such that 
p(t) S X. Thus (15) holds whenever p(t) S \. This establishes the desired con- 
clusion, since \ > 1, p is continuous, and p(0) = 1. 

The next step is to extend the validity of (15) to all ¢ in J by analytic con- 
tinuation, as follows. Let w denote the complex variable ¢ + iu, and let ¢(w) 
be defined by (1) in the strip {w:t e J}. For each m let p,,(w) be defined by (16) 
whenever ¢(w) is defined and #0. It follows from the differentiability of moment 
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generating functions that the functions p, are differentiable everywhere in 
their domain of definition. A straightforward argument based on the continuity 
of ¢, Remark 1(d) above, formula (6), and the convexity of the exponential 
function shows that corresponding to each ¢ in J there exists a complex neigh- 
bourhood of ¢, N(t) say, and a convergent series of positive terms, > nCm(t) 
say, such that | p,.(w)| S ca(t) for all we N(t) and each m = 1, 2,---. The 
details of this argument are omitted. It follows hence that }>mpm(w) is a uni- 
formly convergent series of analytic functions, so that >> mpm(w) is well defined 
and analytic in N(t). (Cf. e.g., [16]). Since this holds for each ¢, it follows from 
the preceding paragraph that > mPm(w) = 1 for we N(t) and ¢ € J; in particu- 
lar, (15) holds for all ¢ e J. 

It follows from uniform convergence (cf. {16]) and the conclusion of the pre- 
ceding paragraph that for each ¢ in J and every k = 1, 2 


ri. (d*/ dw')pm(w) 
is well defined and = 0 for w e N(#); in particular, > nD palt) = (0. Since, as 


: ° =" ° ‘ . ‘ 
is readily seen, D° commutes with the integral sign in (16), we have 


Fi D'palt) = aon | an Di {eo "e"™) dF™ 


R(™) 


= Ofork = 1,2,--- 


(18) 


and each ¢ in J. Assuming for the moment that each of the functions D‘y(t, n, s) 
is integrable when F obtains, it follows by inspection from (3), (6), and (18) 
that (10) holds for each ¢ in J. 

The next and final step in the proof is therefore to verify that each 


D'V(t, n, s) 


is integrable. Since D‘y is of the form y-7 where 7 is a polynomial in n and s, 
it suffices to show that, for each ¢, ¥-| s |'-n’ is integrable for 7, 7 = 0, 1, 2, --- 
This may be established by showing that corresponding to each ¢ in J there 
exist positive numbers ¢ and 6 such that y(/, n, s)-exp (€ | s | + 6n) is integrable. 
Since exp (e€|s|) < exp (es) + exp (—es), it is easily seen from (3) and Re- 
mark 1(d) that this last condition is satisfied. 

In concluding this section, we remark that Theorems 1 and 2 can be gen- 
eralized, by straightforward extensions of the arguments used here, to the case 
when <;, X%2,:*+ is a sequence of independent but not necessarily identically 
distributed random variables, and each z,, takes values in k-dimensional Carte- 
sian space, 1 S k < «. Another straightforward generalization that may be 
worth mentioning here is to the case where the sampling rule is defined for a 


sequence w;, U2, --- of independent abstract random variables, and for each 
i = 1, 2,--- 2; is a real (or vector) function of w;. 


4. An application. Let c be a positive constant, and let the sampling rule be 
defined thus: for any sequence 2) = (21, 22, °-: ), n is the smallest integer 
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m such that sm = 2 +-:- +2» > c,andn = = if no such integer exists. 
Let G be a given distribution function such that P(x > 0| G@) > 0 and 


Piz < 0/1 @) > 0, 


and such that E(x | G) exists and is negative. It is shown in this section that if 
G admits a moment generating function, Theorem 1 can be used to obtain upper 
and lower bounds for P(n < |G). In certain special cases this method yields 
the exact value of P(n < ~ |G). 

The probability in question can be interpreted as the probability of ultimate 
ruin in playing an advantageous gamble long enough (with z, the amount 
lost in the mth play, each z,, distributed according to G, and c the initial fortune 
of the player), and has been studied in connection with insurance theory (cf. 
[8], [12]). In [8] Dubourdieu has given derivations and original references to an 
upper bound, due to de Finetti, for this probability. The upper bounds ob- 
tained here are improvements of de Finetti’s. 

It is assumed henceforth that 


(19) n(h) - | e* dG 


exists in a neighbourhood of h = 0. It then follows from the preceding hypoth- 
eses concerning G, by well-known properties of moment generating functions, 
that there exist uniquely determined points a and b (say), in the interior of the 
interval in which 7 exists, such that 0 < a < b, n’(a) = 0, and n(b) = 1. We 
note that 


(20) n'(h) = 0 forh2a 
and that 
(<1 (a Sh <b) 
(21) n(h)< =1 (h = b) 
\>1 (h > b). 


In (20), (21), and in what follows, h is understood to be restricted to the interior 
of the interval in which 7 exists. 
Now choose and fix an h = a and define 


(22) dF, = [n(h)|'e* dG. 
Then the moment generating function of F; , say ¢, is given by 
b(t) = n(t + h)/n(h). 


¢'(0) = 7n’(h)/n(h), it follows from (20) and the choice of h 
0. Consequently, by well-known properties of cumulative 


Since E(z| F,) = 
that E(x|F,) = 


sums, P(n < «| F,) = 1. Since dG = n(h)e™* dF, by (22), and 
n(h) = [6(—h)I", 
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Theorem 1 yields the identity 
(23 P(n < |G) = E(e™nth)\" | Fi), 


valid for allh = a 


Letting h b in (23), we have 
(24 Pin < « G) E(e™ | F, 
by (21). Since n < *% implies s 2 c, since P(n < ~ | F,) = 1, and since b > 0, 


(24) yields 


Za Pin < </1G@G) Ss 


=~) 


which is de Finetti’s inequality. 

It is clear from the preceding derivation of (25) that the equality sign holds 
in (25) if and only if P(s = ¢|F,) = 1. This condition can be shown to be 
equivalent to the condition that xz be a discrete random variable taking only 
one positive value, say d, and that the negative values of x be integral multi- 
ples of d. The condition is satisfied, in particular, if z takes only two values. 

We turn now to the case when s can exceed c with positive probability. In 
this case, the effect of the ‘excess over the boundary’ can be estimated by means 


of an argument due to Wald [1]. It is possible and advantageous to apply 
the argument to (23) rather than (24), as follows 
Suppose that F, obtains. Write y cifn = 1, and 
/ e— (7%, + + 2 
fl <n < «x. Then y is well defined and 0 < y < <* with probability 1. Let 


¢ denote the conditional expectation of e"”* given n and y. It is not difficult to 


s 


see that — depends only on y and h; in fact 


(26) t= E(e\2z 2 y, Fi). 
We observe next that s = ¢ — y + 2x, with probability one. Consequently, the 
} 


right side of (23) can be written as e “-E(e”-t(y)-n" | F,). It follows hence, 
by regarding y as a real variable confined to positive values, and setting 


(27) f(h) = inf, {e”-£}, — g(h) = sup, {e”-£}, 
that 

h = ’ ’ =f oa % ’ n 7 
(28) f f(h)-E(n" F,) < Pin < t. (7) < é -gth -E(n F, ° 


? In this case a slight extension of the methods of this paper can be used to obtain the 
probability distribution of n, with * a possible value of n. 
Wald used the argument, in the context of a random walk with two absorbing bar- 
riers, to find the maximum possible effect of the excess over a barrier on the probability 
of absorption in that barrier. However, the argumeat also vields the minimum possible 


effect, in Wald’s context as well as in the present one 
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Next, an easy calculation using (22), (26), and (27) shows that 


f(h) int, {el E(e" |x > y, G)}, 
(29 , 


gh) sup, (eM /E(e* | a > y, 
Finally, since n 2 1, we see from (21) and (28) that 
(30 P(n < ~« |G) Ss &™-gth)-nth 


fora < h S b, and 


(51 Pn < 1) = *“. f(h) nth) 
for h = b. 

The infimum of the right side of (30) with A restricted to [a, b| gives, of course 
the best upper bound obtainable by this method, while the supremum of the 
right side of (31) with A restricted to [b, «] gives the best lower bound. In par 
ticular, taking / b in (30) and (31), we have 


(32 -f{(b) Ss P(n < & |G) e .glb 


Another special bound is 
tree he 
(33) Pin < «1/G@) S infaenes fe -nth)}; 


this follows from (30) since 0 < aS 1 

It is easy to see from the preceding argument that in case x is bounded fron 
above, 7 can be replaced by 7 in (30), (31) and (33), where / is the least posi- 
tive integer such that P(n = kG) > 0. 


In concluding this section let us consider an example. In this example, 


(34 G = pH + (1 — p)K 


where 0 < p < 1, H is some distribution function (possibly degenerate) ¢ 
fined to (—«, O|, and 


OT 


he *dxr for r>O 
dK ( r) 
) otherwise, 
where A is a positive constant. It is assumed that H (and therefore G) admits 
a moment generating function in a neighbourhood of the origin, and that 


E(x | G) pE(«\| H) + (1 


p)/X <0 
It then follows that the equation 
(36) n(b) = pE(e” | H) + (1 — p)(r/(a b)) 


has a unique non-zero solution b, with 0 <b < X. 

A simple calculation, which is omitted, shows that in the present case we 
have f(h) = g(h) = (A — h)/A for all h, where f and g are defined by (29). Con 
sequently, it follows from (32) that 


(37) P(n < ~|G) = &™-(A — D)/A. 





11) 
12} 


(13] 


ww 
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ON THE STATISTICAL TREATMENT OF STOCHASTIC 
PROCESSES 


By IstiporE FLEISCHER AND ANTHONY KOOHARIAN 
Bell Telephone Laboratories 


1. Introduction. Grenander [3] has shown the feasibility of applying statis- 
tical techniques to stochastic processes. The basic tool involved in his approach 
is a representation theory for stochastic functions developed by Karhunen 
[4]. 

The present paper is an attempt to develop this approach on a systematic 
basis. Our analysis hinges on Theorem 2 and its corollary, a sharpened version 
of the Karhunen representation. 

The statistical concepts we apply are the Neyman-Pearson criterion for hy- 
pothesis testing and the maximum likelihood estimate. 


2. Basic theory. We shall require the following assumptions throughout the 
analysis: 

(*) The stochastic function x(/, w) is square integrable on © for each ¢ « T. 

The covariance function r(s, ¢) is continuous in each variable and r(t, ¢) is 

integrable on T.’ 
Although Karhunen [4] has established a representation theory for stochastic 
functions under much wider conditions, the following result (Karhunen [5}) 
suffices in our case: 

THEOREM 1. /f x(t, w) satisfies (*) and has mean value m(t) = 0, then 


Ea 


; 2(w) 
r(t,w) = ee ¢a(t), 


k=l V Ax 


where the equality means convergence in the mean on Q for each te T. The x and 
gx are the eigenvalues and eigenfunctions, respectively, of the integral equation 


(2) ge(s) = rx | r(s, t)ex(t) dt, k=1,2,--- 
Jr 


and the {z,} are a set of mutually uncorrelated random variables with zero mean. 
In addition, the following relations hold: 


(3) _ Tf 2,(w) x(t, w)} ox(t)/W re, 


(4) g.(t)x(t, w) dt ze(w)/We 5 
JT 
Received July 31, 1957; revised November 12, 1957. 
17 may typically be taken as an arbitrary interval on the real line, although the en- 
tire analysis could be carried through considering 7’ to be a topological space possessing 
a o-finite Bore] measure. 


AF 
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Remark 1. Equation (4) involves the definite integral of a stochastic function 
with respect to its “indexing’”’ or ¢ parameter. Unless stated to the contrary, 
such an integral is to be taken in the sense of Karhunen [4]; namely, that for 
arbitrary u ¢ L2(Q) 


Z%(w) | 


(4’) or(t)E{i(w)x(t, w)} dt = E< ulw) =>? 
vk 41 WwW). ’ 4 Ww ° 
/T V NK 
Remark 2. The representation (1) is actually the K-integral of the stochastic 
function z(k) against ¢ over the indexing set k = 1, 2, --- with measure 


] Vr; - 


In other terms the K-representation converges to x(t) (Q weakly) for each ¢ ¢ T. 
We next prove 

THEOREM 2. The K-representation converges to x(t) (Q weakly) in the mean on T. 

Proof. By (*) the complex valued functions FE {dr(t)} for arbitrary u ¢ L(Q 
are square integrable on T. In view of Remark 1 the Fourier coefficients of 
E{az(t)} with respect to the orthonormal set {g,} are precisely E{idz,/+~V/x,}. 
The theorem then follows if it can be shown that for any square integrable 
gile¢e,k = 1,2, --- 


I(u) = E} tix(t) }o(t) dt 


/T 
vanishes identically for u ¢ L2(Q). 

The linear functional J, however, is easily shown to be bounded which, to- 
gether with the fact that by Mercer’s theorem 


Iiz(s)} =0 for seT, 
suffices. 

Coro.iary. If, in addition to (*), x(t, w) e Le(T) for almost all w ¢Q, then 
the {z,| defined by Eq. (4) may be identified with the ordinary Fourier coefficients 
of x(t) with respect to {gx}, so that the K-representation converges in the mean on 
T wherever z(t) ¢ L2(T). 

Remark 3. The additional hypothesis involved in the corollary is satisfied 
if z(t, w) is measurable on the product space T X Q. 

S. P. Lloyd has pointed out that this is the case if z(t) is taken as one of its 
standard modifications ({2] p. 61-65). 


3. Hypothesis testing for stochastic functions. In this section we shall apply 
the Neyman-Pearson test to stochastic functions satisfying the corollary. 

Let the probability measure on 2 according to the null hypothesis be Po and 
according to the alternative be P,. Then if {A , ¢:} and {Ai , ¢:} denote the 
eigenvalues and eigen-functions associated with the integral equations for the 
respective covariance functions, ro and r;, we obtain the dual representations 


oo 0 

= / 2(w) "4 07 

(5) z(t, w) = = = ort) + m (t) 
k=1 A; 
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and 


/ zi(w) 
(6) x(t, w) = - ei(t) + m'(t) 
k=l Vy 
in the sense of the corollary, where m’ and m’ are the respective means 
It may easily be shown that 


a oe 
(4) <= Axx V/ + b; ° 


k=l : 


almost everywhere Py and P; , where 
si 1 ( 
jk [ ¢;(tex(t) dt, 


g3(t)[m°(t) — m'(t)] dt. 


Assuming that the finite dimensional measures induced by Py» and P; on 

the sample space of the z’s are absolutely continuous with respective density 
° 1 ° . . ° 
functions gy and gy , then the likelihood ratio is 


Vx, dom = +04) 


Vy 


The martingale convergence theorem [2] then implies that ly converges to the 
likelihood ratio on the infinite sample space almost everywhere Po. Hence 

THEOREM 3. The Neyman-Pearson critical region for the rejection of Po against 
P, 1s 


lim ly = k. 
N+ 
An application. It follows from Eq. (4) that if z(¢) is a normal process, then the 
2; are normally distributed as well. Since the z; are uncorrelated they are now in- 
dependent so that their finite dimensional densities are explicitly known in 
terms of the eigenvalues. 
Thus in testing for the covariance function in the normal case with zero mean, 
the appropriate likelihood ratio is the limit of 


Ji. os - AN 07.0\2 1 — a; ie z; *1 
(9) eo (3 > Ne(Ze)” — Me dX Vr) If 
As an illustration consider the case where it is desired to test the hypothesis 


that a stationary normal process on the interval (0, 27) with mean zero has the 
covariance function. 
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ro(r) 


against the alternative 


Here we must have 


0 yes >~ a= Dd 6=1 


—-z =——2 
The reader will readily convince himself that 


0/ ikt | 
g(t) = g(t) =e , : _ 
By 
It follows that the matrix transforming the z’ into the z' is the identity matrix 
aj, = 5. The likelihood ratio is thus the limit of 


( N 2 2\\ 
(10) ly(zi, +--+ ,2y) = 4 = a exp 4+ Ze (ny (A — =) 
— a N \ J 


k=l a, B? 


Since the logarithm is monotone and continuous, we may describe the critical 


region by 
0)3 
(1 — v1) 2 K, 


N~n i 


lim log Il ¥ + > = 


k=] Oy 


where 7, = ax/&. If [[%1 7; exists this may also be written 


“ 
(i > rr(et)*a — +2) = K* 

4. Estimation theory for stochastic functions. In this section we shall treat 
the problem of applying statistical estimation to stochastic functions. We 
shall show how the classical maximum likelihood estimate can be adapted to 
the situation at hand, and illustrate by an example. 

Let x(t) be a stochastic function with underlying probability space 2 and sup- 
pose that the measure on Q is known to be one of a family, P. , a ¢ A. It is de- 
sired to estimate a on the basis of a single observation of x(t). 

We shall assume that each P, is absolutely continuous with respect to some 
fixed probability measure uw. Thus there exists a family of positive u-measurable 
functions f(w, a) such that for every measurable set S in Q, 


(12) PAS) = | fle, a) du 
In these circumstances it is appropriate to require that in addition to (*), 
x(t) be square integrable in ¢ almost everywhere yu. Since we shall commit our- 
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selves to using only the L” properties of our realization, we may as well assume 
. . 2 
that f(w, a) has the same value on realizations equal almost everywhere 7’, 
Each P, generates a Karhunen representation 


x 


ZilWw, a) 
(15) x(t) = 2 p(t 


— SS \,(a)” 


Here the z; have a distribution known in terms of P, . Furthermore, for different 
a, these variables are related analogously to (7). We have, in fact, 


(4 z; (w, B) = 


= (a, 8) + | [m*(t) — m’(t)] ¢j (t, B) dt, 
V X;(B) =l } 7 


where 


= | gilt, a) ¢; (t, B) dt 
°7 
and the equality holds almost everywhere uy. 

Let fx(z,--- , zy ; a) be the density function of P, with respect to u on the 
sample space of z,(w, @). Again by the martingale convergence theorem [2], 
there follows: 

THEOREM +4. /f A is separable and if f(w, a) depends continuously on a almost 
everywhere pw, then f(w, a) and, therefore, the maximum likelihood estimate of a can 
be calculated from the fy . 

Example. Let x(t) be a normally distributed stochastic function with known 
covariance whose mean is to be estimated. We arbitrarily choose the dominating 
measure p to be that one corresponding to m(t) = 0. 

Since 


} a 
+m™m,, 


Vd; 


the finite dimensional density function for the 2; may be expressed in the form 


MeeeA i< = 
DgiZy 5 ** *, Bu; « Ve — exp4— > A;(r; — m*)*s 
: 1 (Qr)x?2 So = } 


so that the likelihood ratio is 


N 
] , 
° > . } t - 
limfy(21,--+,«w~ 5a) = lim exp 4; i A; {aj} — 


Now V+nx 


l x 
-— aw a [5 i ee @)2) 
= eXpy{5 > A; (2m; Zz, (m 


The likelihood ratio clearly achieves its maximum for 
a 
Mm; = Zj, 


2 What this amounts to, in effect, is to replace f by Eff | 21 , 22 
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The only possible maximum likelihood estimate is, therefore, 


m*(t) x(t). 


It may well happen, however, that the realization z(¢) cannot serve as a mean. 
For example, it follows from (*) that m*(¢) is continuous for every a so that if 
the observed realization does not have this property, the maximum likelihood 
estimate does not exist. 
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SOME REMARKS ON SAMPLING WITH REPLACEMENT 
By Des Ras AND SALEM H. KHAMIS 
American University of Beirut 


1. Introduction and summary. In order to estimate the mean of «a finite 
population from a random sample, the sample units may be selected in two 
ways. A pre-determined number m of units may be selected with replacement 
or sampling with replacement may be continued till a desired number n of 
distinct units is obtained. The first procedure is called sampling with replace- 
ment while the second may be called sampling without replacement. A com- 
parison is generally made between the two procedures with m = n. This com- 
parison, however, is not fair since costs are usually proportional to the number 
of distinct units and in the first procedure this number would be less than or 
equal to n. 

In the first procedure the population mean is usually estimated by the sample 
mean based on all the units in the sample including repetitions while in the 
second procedure the estimate is generally made to depend on the distinct 
units only. The object of this paper is to show that the estimate making use 
of only the distinct units is superior in either procedure. The following results 
are proved in this paper: 

(i) In sampling with replacement the estimate of the mean based on distinct 
units in the sample is superior to the estimate based on the total sample 
size when (a) the total sample size is fixed in advance, while the number 
of distinct units in the sample is a random variable, and, (b) when the 
total sample size is a random variable while the number of distinct units 
is fixed in advance. 

The same is true of ratio estimates. It is also shown that the bias is nu- 
merically less if the ratio estimate is based on distinct units regardless of 
whether these are fixed in advance or considered as random variables. 
(iii) Expressions for the estimation of the variances of the various estimates 
considered in this paper are given. 
(iv) The above results are extended to multistage sampling. 


2. Statement of the problem. Let us consider a finite population consisting 
of N sampling units. Suppose we are interested in estimating the population 
mean Y for a character y, from a sample selected with replacement with equal 
probabilities. We consider the following two sampling schemes. 

Scheme A. We select with replacement a total sample of size m fixed in advance. 
We denote the number of distinct sample units selected by the random vari- 
able u. 

Scheme B. We select with replacement a sample of n distinct units fixed in ad- 
vance. We denote the total sample size, including repetitions, by the random 

variable v. 
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We consider Scheme A first. Let a; , a2, --* , a, be the u distinct units selected 
in the sample and let k, be the number of times the rth unit a, occurs in the 


sample, with >> k, = m. We compare the bias and variances of the usual esti- 
mate 


(1) fa=— Dkew, 


™ ¢ml 


and the estimate 


based on the distinct units only, where y, is the value of the character y for the 
rth distinct unit in the sample. 


The estimate (1) is well known to be unbiased and that its variance is given by 


oe ee ee ee ee 
J (Ym) = = (: t)e = (¢ x) ; 


where 


l = tr 
Fe => —e ( 1 — ry. 
. > y 


For a given u the expected value of j, is Y so that 7, is an unbiased estimate of 


Y. With regard to the variance of 7, we have 


rw =[«()-g} 


The estimate 7, is superior to 7m if 


E (*) <Q 


The probability distribution of the random variable u is given by (cf. Feller, [1]) 


(8 P(u) = N™ (*) ator 


where the sth difference of 0° is defined by 


A’ 0’ = y (- _ 


Hence 
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and the expected sample size is 


(10) E(u) =N" > 


= 

We consider now Scheme B. Let b; , bo, --- , b, be the n distinct units selected 
in the sample and let 4, be the number of times the rth unit b, occurs in the sam- 
ple with ps k, = v. We compare the bias and variances of the estimate 

(11) 


and the estimate 


(12) 
mn; 


where y, is the same as in Scheme A with obvious modifications. It is easy to see 


that the estimates 7, and 7, are unbiased for estimating Y. Also we have 


(14) 
The estimate 9, 1s superior if 
(15) 


The probability distribution of the random variable v can be shown to be 


n— 1 


Hence 


(17) E (‘) = (* - . _ Ly! A"'0 
v a v=n l 


We give in Table 1 a numerical table for selected sample sizes which illustrates 
the numerical magnitudes of the differences discussed above. The theoretical 
proofs are given in Section 3 below. 


3. Proofs of the inequalities. In order to establish inequality (7), we shall 
first prove the following 
Lemna. Let 


(18) i wee (*) a°0" 


sant cs =~ EL 
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TABLE 1 


1 
Values of E(u), 2(*), Q and 1/E(u 


u 





7 . at 2 1 1 
N m E(u) E( “) Q - Ka 
N = 10 l 1.00 1.000 1.000 1.000 1.000 
2 1.90 550 550 500 526 
3 2.71 385 .400 333 369 
} 3.44 303 325 250 291 
5 4.10 253 280 200 244 
6 +.69 221 250 167 213 
N = 50 ] 1.00 1.000 1.000 1.000 1.000 
2 1.98 .510 510 . 500 505 
3 2.94 343 347 333 340 
$ 3.88 260 265 250 258 
5 1.80 210 216 200 .208 
6 5.71 177 . 183 167 175 
N = 100 l 1.00 1.000 | 1.000 1.000 1.000 
2 1.99 505 - 505 .500 .503 
3 2.97 .338 340 | .333 .337 
4 3.94 255 258 | .250 .254 
5 4.90 205 208 . 200 204 
6 5.85 | 172 | 175 | «167 171 
Then 


(19 Siw S (N + 1)Stiw fort = 0,1,-:-,m—1; m> 2, 


the sign of equality holds only for t = 0. 
PRooF. 


Six = os (A*“*o"-** + A*o"-*™) 


= urt u 


m—t yr s , m—t—1 ‘ 
_ > N —- u - 1 ( N ) AY igt tt a .. (; atom t—1 
u=2 “> u  aailt 


( N— wu ‘ u \(*)aro" 
u+ic+ ] it+f u 


. > N + 1) + tN (*) avo t 1 


(u + t)(ua+t+id)\u 


m 


iM} 


Thus So,. (N +1)Siy, and Siw < (N+ 1)Sui1~ for 1 StSm-— 1. 
This proves the lemma. 
Coro.uary. Applying the lemma m — 2 times beginning with / = 1, we have, 


form > 2 


Sy =< 48 8h a ig 2a. oy 


m 
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so that 
, m-2N — 1 
(20) Siw-1 < N” ~- ; 
m 
Now, inequality (7) is proved in the following 
THEOREM. 
. fi ] N-11 
(21) E ') =—+ - , 
u N N m 
the sign of equality holds only for m = 2. 
PROOF. 
, ] ™—m = N Upnm—l u-—l~_am—l 
E(-)=A X( ) (a*o ror 
u a i 
1 i-m < N- a 
cae yee? ')a*0 1 
N 2 u ou l 
] am N- ia 
-fever Fo ( ‘) ator 
J i util u 
] yl—m 
- V +N S1,.V—1 
On using (20) we have for m > 2, 
e(t)<}+3 ona. 
u N N m 
For m 2, Si,w-1 +(N — 1), so that the last inequality reduces to an equal- 


ity in this case. 
To prove inequality (15), we make use of the following 
Lema. /f v is a positive random variable, then 


99) ’ ] l 
(Se E — =— . 
( ) ~ ~E(v) 


The proof of this lemma follows from Cauchy’s inequality (ef. Hardy and 
others, [3]), 


(23) (S@’)(S 8) > (SH ad)’, 
by substituting a = VWvP(v). b = VWv"P(v), and noting that the two sides of 
the inequality are convergent because E(v) and E(1/v) are finite. 

Now (cf. Feller, [2]) 


(1 l 
iv) = N(— + owe _— ; 
' (5 yx ‘=a :) 


at l N—-n+1 N—n 
‘ie t= = ——______ — for mo 
( ) * n nN . n(N — 1) 
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or 
or 
or 


It is easy to see that for n 1 the two procedures are equivalent and the in- 
equality (15) reduces to an equality. 


4. Estimation of variances. We shall consider the problem of estimating 
from the sample the variances of 7, and 7, 


. Under Scheme A, it is easy to see 
that for a given u => 2, an unbiased estimate of o is provided by 


Thus considering 


we have 
26) E|G,, | u = 2) Vig 
so that G, provides an unbiased estimate of the variance of 7, . It is unbiased in 


the conditional sense, namely when the number of distinct units in the sample 
exceeds unity. An alternative unbiased estimate is provided by G, where 


Gg’ ae ] ba ‘) 5 N —— l 2 
ee tes we eee 
ny)" 


ai) 


( Yu) > ior u Pa : a 
s-1¢ 


and 


for u i 


(28) |} s. °| = 1A ™ o 


where 
2 l . \8 
(29) 3? = > (yi — de) 
v—lia ~ : 
Thus 


(30) E E | = E () N + ; wv 


so that 1/v s, is an unbiased estimate of V(9,). 


5. Extension to ratio estimation. We shall now extend the above results to 
ratio estimates. We make use of the notation and approximate results given by 


Cochran [1]. The object is to estimate the population ratio R = Y/X. Under 
Scheme A we compare the two estimates 


(31) R,, = Za keys > kz; 


i=l 
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and 
(32) Re = > Yi 2. Ti. 

i=] t=] 


We have 


V(R,.) = I e(e _ +) [Cw + Cee — 2Cayl 


B(R,,) = R (2 - ryt a 
V(R.) = aL ¢ t) xtc ee a 


B(R,) =R |# (*) — 7 | (Cos — Cay) 
u d 


where B(R) stands for the absolute value of the bias of the estimate R. Using 
inequality (7), we have 


(34) V(R.) < V(R,.) and B(R,) < B(R,,). 


Under Scheme B, the estimates to be compared are 


(35) oe ee oe 


i=] 


and 


—~ 
ow 


(36) 


It is easy to see that the estimate FR, is superior to #, from the point of view of 
variance and bias. 


6. Extension to multistage designs. The result obtained for unistage de- 
signs will now be extended to multistage designs. Let a population consist of NV 
first stage sampling units, of which m or n are selected with equal probabilities 
with replacement according to the Schemes A or B respectively. For the 7th 
first stage unit, let ¢; (based on sampling at second and subsequent stages) be an 
unbiased estimate of y;, the total value of the character y for the unit. For 
Schemes A and B the unbiased estimates considered are 


37) 7. ae 


i=l 


. . 
(38) in = 7 kts, 


m j=1 
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or 


and 


ms vy ix 
(39) iu tie, 


Mt i=l 


10 i. = : > kets. 


The variances of the estimates are given by 


5 wna l 2 
i(: . >) : (‘) 7 7 : 
B Q+ =| g 
o mN 
io 6 ] l ° 
(43) (7 = ' -_— Pe 
' ' Jn) nm : (: ‘) - 


ria! N — 1 ’ ] a3 2 & 
(44) VQ.) = a E (‘) (6 +o) + N’ 


where 


It 


(41) VQ.) 


ll 


$2) V (Gn) 


l N 
se = N » Vi(t,). 


Using inequalities (7) and (15) it is found that ie is superior to Tn while Un is 
° , 
superior to 7. 
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A DISTRIBUTION-FREE UPPER CONFIDENCE BOUND FOR 
Pr {|Y < X}, BASED ON INDEPENDENT SAMPLES OF 
X AND Y 


By Z. W. BrrnBAuM AND R. C. McCarty 


University of Washington and Boeing Airplane Company 


1. Summary. A solution for the problem of obtaining a distribution-free one- 
sided confidence interval for p = Pr {¥Y < X} has been proposed in [1]. At 
present a numerical procedure is given for computing the sample sizes needed 
for such a confidence interval with given width and confidence level. 


2. Introduction and formulation of the problem. The problem discussed in 
this paper arises in practical situations such as the following: structural com- 
ponents of a mechanism are mass-produced and each component has a strength 
at failure Y which, in view of unavoidable variability of the product, is a random 
variable. A component is then installed in a mechanism and exposed to a stress 
which reaches its maximum value X, again a random variable. If, due to chance, 
the values of Y and X are so paired off that Y < X, then the component fails 
in use. It is therefore of considerable importance to have an upper bound for 


1) p=Pr{Y < X}. 


Our problem is: Can p be estimated from samples of X and Y alone and, in par- 
ticular, is there an upper confidence bound for p, i.e., a statistic y based on a 
sample of X and a sample of Y, such that for any e > 0, a > O there exists a 
pair of numbers M..4, N.,a2, so that 


2.2) Pr{pSyt+e}21l—a 


when the sample of X is of size m = M,,, and the sample of Y of sizen = N..4? 

The following answer to this question was proposed in [1]. 

We assume that X and Y are independent random variables with continuous 
cumulative distribution functions F(s) = Pr {X < s}, G(s) = Pr {Y < s}. 
Let Xi S Xo S --- S X,» be an ordered sample of X and ¥i S$ Y2 S-:: 
Y, an ordered sample of Y, and let F»(s), G,(s) be the empirical distribution 
functions corresponding to these samples. 

Using the Wilcoxon-Mann-Whitney statistic, 


U = number of pairs (X;, Y;) such that Y; < X; , 


we write 


(2.3) p = U/mn. 
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It is easily verified that p is an unbiased estimate of p and 
+s 
p= | G(s) dF(s), 
2 


| G,, (s) dF,,(s), 


p-p= | Gd(F — F,.) + I (G — G,) dF. 


a) 


* 


(2.5) p—-peD,+D,, 


(F,, - F)dG+ (G — G,) dF, 


and 


where 


D.. sup {F.(s) — F(s)}, 


—2 <#8<+e 


D3 sup {G(s) — G,(s)}. 
0m <8cte 


It is well known [2] that 
Pr {D,, < v} = Pr {D, < v} = P,(v) 
and 
Pr {D, < v} = P,(v) 


are cumulative distribution functions which depend on,the sample sizes m, n, 
but not on the c.d.f.’s F and G. It follows from (2.5) tkat 


(2.6) Pripspt+e} = Pr{D,+ Da S e} = Pasale) 
where ?,,,,(€) is the convolution of P,, and P,, hence does not depend on F 


and G. The statistic p has, therefore, the property required of y in (2.2) pro- 
vided one can, for given ¢, a, determine numbers M,,., Nea so that 


(2.7) Past) & i-~—« form S B.., 8 S Naw 


Some further properties of p are discussed in [1]. 
A numerical procedure for computing M,.4, Ne is presented in the next 
sections. 


3. An approximate expression for P,,.,(e). It was shown by N. Smirnov [3 
that 


« > ‘ rT r f— 
(3.1) lim Pr {D, S 2/V n} = lim P,(z/Vn) = 


nx hoe 
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Since, for fixed n, P,(z/~n) = H,(z) is a cumulative distribution function, and 
gt » « 

L(z) =1-—e™~ isa continuous c.df., it follows by a well-known argument 

(see, e.g., [4], p. 276) that H,(z) — L(z) uniformly. We may, therefore, conclude 

that 

(3.2) lim [Pr {Di < v} — LivVn)] = lim [H, (eV n) — Liv n)| = 0 


n--o n~o 


uniformly for 0 < v < 1. Writing 


Pale) = Pr {Di + Di Se} = | Pale — wu) dP,(u), 
“0 


(3.3) a 
0... s\e) = | L{(e — u)- Vn] dL(ur/m), 
Jo 
we have 
| Pmin(€-) — Qmnle) | < | {P,(e — u) — L[(e — u)VnJ} dP. (u) | 
J, + | {Pr(e — v) — Lie — v) Vm} dL(v-Vn) | 
(3.4) Jo 
< Max! P,(e — u) — L[(e — wv) Vn) | 
Osuse 
+ Max | P,,(e — v) — L{(e — v)-V ml) |, 
Osrse 


which in view of (3.2) shows that 


lim | Pmn(e) — Qmn(e) | = 0 


uniformly forO0 < e < 1. Thisjustifies the use of Q,,,,(e) as an approximation to 
Pn»(e) for m, n sufficiently large. Some observations on the goodness of this 
approximation are presented in Section 5. 

By straightforward integration one obtains for Q,,,(¢€) the expression 





n ~2me2 m oe 
0....4e) = 1 —- ——_-¢ — é 
; m+n m+n 
(3.6) a 
2/2r MNE —omne®/(min 1 [- Vmtn  _ iio it 
—_ — = é —_ e ai. 
(m + n)3 - V 29 L2ne/y/ mtn 
4. Sample sizes m, n which satisfy Q,,,,(€) = 1 — a. With the notations 
m+n=WN 
(4.1) m/(m+n) =, n/(m+n)=1-—A 


ev/m + n=6 . D 
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TABLE I 
Values 6y,4 such that Q(i,4;’) =l—a 





» 
1 2 3 / : 5 
10 4.1185 3.2027 2.8501 2.6928 2.6468 
.05 4.6115 3.5667 3.1641 2.9844 2.9317 
01 5.5700 4.2745 3.7770 3.5524 3.4870 
005 5.9300 4.5405 4.0050 3.7665 3.6960 
001 6.6800 5.0980 4.4880 4.2150 4.1360 
the expression (3.6) for Qm..(€) may be written in the form 
6) 1-1. owe 
49 
(4.2) 2x8 
— _9 ‘ 2 _s2/9 
— 2o/2s (1 — d) 8” =| edt. 
V 2a J21-r)8 


Table I contains solutions 6,,. of the equation. 
(4.3) Q(6;4) = l—-—eo 


for a 001, .005, .01, .05, .10, and X = .1 (.1) .5. These solutions were ob- 
tained on a desk calculator, using the National Bureau Bureau of Standards 
Tables of the Exponential Function [5], Descending Exponential [6], and the 
Normal Distribution Function [7]. 

The use of the quantities NV, \, 6 instead of the original m, n, ¢ has not only 
the advantage of reducing the computations to a table-with double entry, but 
also makes it possible to design an experiment with a given ratio \ = m/N. 
This ratio is often dictated by considerations of cost or time. 

Example. We wish to use four times as many Y’s as X’s, i.e., \ = .2, and re- 
quire « = .10, a .05. From Table I we have 62,6 = 3.5667; hence, by (4.1), 
(10) N 3.5667, and N = 1272.13, m = 254.43, n = 1017.70. The rounded- 
up sample sizes are therefore 255 for X, 1018 for Y. 


5. Concluding remarks. The sample sizes computed for given \, e, a by the 
use of Table I are conservative, i.e., too large, for two reasons. The first is that, 


instead of finding sample sizes m, n such that P{p S p + e} = 1 — a, we used 
inequality (2.6) and looked for m, n satisfying P,.,,(€) = 1 — a, a step which 


certainly vields larger values. The second reason is that in equation P,,,,(e) = 
1 — a the exact expression P»,,,(€) was replaced by the approximate expression 
Qm,»(e) and then only m, n were computed. This step was justified by (3.5) which, 
however, does not indicate which way the sample sizes are affected. The following 
arguments are offered in favor of the contention that the solutions m, n of 
P,,.,(€) = 1 — a@ differ little from those of Q,.,(€) = 1 — ¢€ and that the solu- 
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tions of the second equation are more conservative (greater) than those of the 
first. 

The exact form of P{D;, S v} for finite n is known and numerical computa- 
tions, some of which are reproduced in [8], show that already for n = 50 the 
approximation of P{D S v} by L(vV/n) is uniformly very good. Since the sam- 
ple sizes computed from Table I are in all practical situations much larger than 
50, (3.4) assures very close agreement between P,,,(e) and Qn.n(e). 

Furthermore, the following conjecture appears to be substantiated by con- 
siderable numerical computations and some analytical considerations, although 
no proof for it is available: for every integer n = 1 and for 0 S v S 1, 


(5.1) Lv/n) =1—e s P{Dt S v}. 


From (5.1) would follow that Pn..(e) 2 Qmale) forO S € S 1, hence Q 


(oe) 
= em n\€ 


1 — a would yield sample sizes larger than P»,.(e) = 1 — a. 
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ON THE INTEGRODIFFERENTIAL EQUATION OF TAKACS. I. 
By EpaGar Retrcu! 


University of Minnesota 


1. Introduction. This paper is devoted to a study of certain aspects of the 
mixed-type Markov process n(t), originally treated by Takdcs [8]. It extends 
ind unifies a number of results of previous workers. 

Let N(t), N(O) = 0, t = 0 denote max {n|1t, < t}. We shall be especially 
interested in the case where 0 < 4; < t, < ---+ are the events of an (in general) 
non-homogeneous Poisson process of density A(t) 2 0. We assume that A(?) 
is Riemann integrable over all finite intervals. (The homogeneous Poisson proc- 
ess corresponds to A(t) = const.) Let xo, x1, x2, °** be a sequence of non-nega- 
tive random variables. Except in a part of Section 5, they are mutually inde- 
pendent, and independent of N(t); moreover, H(z) = Pr {x; S x} is the same 


for 7 = 1, 2,---. Introducing the notations 
‘ Nit 
: , , O,z2 = 0 
(t) dN(t) = xo-4 5 Liz) =<,” | F 
[ox et Ze x Le) = 17 So 
one may define (See Fig. 1) 
at pt 
(1.1) n(t) = | x(u) dN(u) — | L(n(u)) du 
= “0 


It is sometimes instructive to formally redefine x(t) as a stochastic process with 
x(t), x(t’), (t # t’), independent, Pr {x(#) S x} = H(z), t > 0. One then con- 
cludes immediately, from the functional form of (1.1) that n(t) is a Markov 
process. Note that var (n(¢ + Aft) — n(t)) = O((At)”), t: <t <ti4:, so that 
Feller’s [5] function a(t, x) = 0. 

In Section 2, the problem of finding the distribution of (¢) will be reduced to 
finding the unique solution of a Volterra equation of the second kind. In Section 
3, the corresponding result is found for the process n*(t), where, if ¢’ is the first 
zero of n(t), 


Jnlt)t<t 


*(?) = 
is eo tae. 


The work in Sections 2—+ generalizes results of Benes [2] who treated the Takacs 
process when X(t) = const (under somewhat milder restrictions on H). Section 
5 contains some results on the asymptotic nature of n(t), derived from a more 
general point of view than that employed in the preceding sections. 


2. The Volterra equation for Pr jn(t) = 0}. 
Define A(t) = fidrA(u) du, F(t) = Pr {n(t) S x}, F(t) = F(t) = Pr 
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Fig. 1 


{n(t) = O}, ¥(s) = Ee", i = 1,2,--- , B(t, s) = Ee”, O(s) = H(0, 8) 


Ee, (Rs = 0). It then follows [8] that F(t, x) is continuous, t = 0, z > 0, 
and° 


a, =), 2) MAF(t, z) + Mt) | H(z — y) d, Ft, y). 


ot Ox 0 
Consequently, 
pt. re 
(1, Ps + 8 | ent) — 1—¥(s)J) [A re ia) du 
Zh) i 


ni &(s)e" 1—y¥(s) J ACt 


; Rs = 0. 


Thus, if F(t) is known, F(t, x) can be computed by quadratures. Equation (2.1) 
contains two unknown functions, F(t), and ®(t, s), which might a priori lead one 
to believe that, unless the explicit relation between the two functions were 
brought into the picture, neither could be uniquely determined from (2.1) alone. 
However, by taking advantage of the regularity properties of @, (and certain 
additional regularity properties of A, H) it turns out that (2.1) actually deter- 
mines F(t), and hence also ®(¢, s), uniquely. (Cf. Bailey [1] where regularity prop- 
erties are used to solve a functional equation containing two unknown functions. 
See also [9], pp. 52-53.) 

THEOREM 1. Suppose (i) X(t) € £2 for every finite interval, (ii) H(x) = ff h(é) dé, 
e h(x) ¢ £2(0, ~) for some c = 0. Then F(t) ts the unique continuous solution 


2 Two functions of ¢t will be written as equal, if they exist and are the same for almost all 
i> 0. 
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of the Volterra equation of the second kind 


~t 


(2.2) g(t) = FW + K(t, u)F(u) du, where 
-0 
; ld wines oe ee 
K(t. oe P.V. | ef [A(t)—A(u)] [1 m=, 
2m dt /z-iw 8 
F De a cece . ds 
g(t) = . | P(s)e erp ats —, is OF as. 
2a1 dt Js—iw = 


z+iM os 


Lemma 2.1. /f x >0, M>0, —x% <y S yo, then 1/2mi fein e 
bounded uniformly with respect to M, vy. 

Proor. Let C, be the rectangular contour bounded by s = x + 1M, and paral- 
lel lines extending to ~ in the right (left) half plane if y < 0 (y > 0). Then 


/sds is 


| [ e 1 + sgn vy 
7 - ds an ao 
2mt Jc, 8 Z 
z+iM z+iM 
l ii ] = ] 
271 z—iM 2m c. 2m z—iM 
K . ; 
< ‘<K., if |yM|21 
y|M 
On the other hand, if | yM | < 1, then 
| 
| ez+iM -z+iM es | 
| 1 a. < s é or 
2m J iM 2 |\2m2 Jim s 
-z+iM e 
SMe YW! | ds| = 2K2|y7| M S 2K. 
z—iM 8s 


Lemma 2.2. [fx > 0,M>0,0 5 aS a,r(t) € £00, ©), R(s) = fe ert) 
dt, then 1/2xi [2°}% R(s)e™ ds/s is bounded uniformly with respect to M, a. 
Proor. Since the integral for R(s) converges uniformly on the line Rs = z, 
1 -z+iM as a8 Cd ( 1 -2+iM a 1s) 
L- — | R(s)e ds = [ r(é)<— ne ae 
2at /s—im 8 Jo 2a J2-im 8) 
Hence, by Lemma 2.1, | L | S Ks fo | r(#) | dt. 
Proor oF THEOREM |. Dividing both sides of (2.1) by s’, and integrating along 
the line Rs = x > c from s = x — 1M, tos = x + 1M2, (M,, M2> 0), we 


have 


1 er+iM » ds ot ( 1 pz+iMs ds) 

‘ ( u)a—[A(t)—A( [1—¥(s < ’ 
ania (/, s) — + « — Ze a w)) vo) ** F(u) du 
2m “z—-iM, a “0 \2m7 “2z—-iM, 8, 


1 ezr+iMs oa | ds 
= &(s)e*~’ et t= 0. 


2m ~2—-iM, - 


Since &(t, s) is regular, | @(¢, s)| S 1, when Rs > O, the first integral on the left 
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side converges (absolutely) to zero as M, , M; — ~, and the integral on the right 
side converges absolutely to the function 


a+ 100 


; Is 
, te—A(t) [1—¥(s)] AS 
g(t) = 5 &(s)e' A(¢) [1—-¥(e)] 


Hence 


*t “2+i Mo 
l 2 


t—u)s—[A(t)—A(u)] [1—W(s ds 1/ 
| 1a — re > F(u) du 
(2.3) 1Mg~20 40 | Smt <2-iM, S 


= g(2), t2@2¢>-¢c 


In particular, we shall henceforth take M, = M, = M, in order to make it 
possible to be able to invert the order of integration in (2.3). We can write 


-2+iM ds 1 -z+iM 


as—8f Byis \ ds 
= ¢ "Ie 1 — By(s) | - 
s “tl /z—-iM s 


—8s -z+iM 


re. pu ds , Be” Hoe ae + 1 +H 


2m /2-im 8 2at Js—im 8 


-p ~z+iM 


By the Riemann-Lebesgue Lemma, limy,).< | y(x + ty)| = 0. By Parseval’s 
equality, 


| W(x + iy) |? dy = [ e*[h(x) |)? dr < «. Hence, 
00 0 


patio : ds 


I\| < 6Ke™ | | ¥(s) P— < 
J z—ico 8 

Therefore, as M — «, I converges absolutely, and uniformly with respect to 
a, B, |a| S ao,|8| S Bo. By Lemma 2.1, limy.. J] = e°, boundedly with re- 
spect M > 0,0 S a S a. By Lemma 2.2, limy..2 JJ] = Be °H(a), boundedly 
with respect to M > 0,0 < @ S ao. Thus we may rewrite (2.3) as the Volterra 
equation of the first kind, 

at 

G(t, u)F(u) du = g(t), t= 0, ie >), where G(t, u) 


u) i-v(s) AS 
— = pla, 8) — ala, B), 


s 


3 ES a 
ewe 3 ” 1 ~ By(s)| * > + i 


a(a, 8) = Be H(a), a=t—u2Q0, 8 = A(t) — A(u). 


Next we deal with the question of the existence and nature of the derivative 
g(t) for almost all tf = 0. First we focus on the existence and nature of 
d f[' 


dt I pla, B)F(u) du. 
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Since p(a, 8)F(u) is continuous in ¢, u, p(0, 0) lL. 
d [ ; ; d aT 
pla, B)F(u) du = F(t) +] — 
dt dt Jo 
The partial derivatives p. , p3 exist and are uniformly continuous in0 S a S ap 
0B B. Thus Ap = pzAa + pgA8 + eda + &AB, limaaas.c€;, = 0, 
uniformly bounded with respect to a, 8, 0 S a S ao, 0 SF B S Bo. We have 
( i“ -t+At 
pa + €2) | A(v) dv. 
At 


“¢ 


=f 


p(a, B)F(u) au | 


“0 


Ap 
At = Pa + €) — 
Let E = {t|A’(t) = A(t)}. We see that for ¢ ¢ E, Ap/At is bounded uniformly 


with respect to u, and dp/dt = pa + psd(t). Hence, by the bounded conver- 


gence theorem, 
7 9p 

— F(u) du, 
> at u) du 


¢ [arnae =| 


and therefore 
al ; .t 3 
3 | pla, 8)F(u) du F(t) + I 5 F(u) it 
-t 


(2.6) 
| paF(u) du + ait) | pa F(u) du. 
“0 0 


We see, by (2.6), that 
—— 
“4 eA), A= {(twlOsuS th. 


Also, { dp/at F(u) du « £* (for every finite interval). Next, consider 


t 
[ ala, B)F(u) du. 
Jo 


By noting that fact ({4], pp. 111 ff.) that for continuovs Q(u), and h ¢ £’, 
continuous function of ¢, 


t t 
d/dt [ Q(u)H(t — u) du [ Q(u)h(t — u) du = 
Jo 


Jo 
for all ¢, one finds that d/dtfj o(a, 8)F(u) du = fo d0/dt F(u) du ¢ £*, with 
dc/at ¢ £°(A). Thus (2.2) holds for almost all ¢, with 


" OG(t, u e, 
K(t,u) = @o™ - oa), 
ot 
£* over every finite interval. Under these conditions it is known [7] 


and g'(t) « & 
that (2.2) has a unique £° solution, F(/); in particular, there is a unique con- 


tinuous solution. 
3. The Volterra equation for Pr {n*(t) 
Define B(t, x) = Pr {n*(t) S x}, B(t) = Pr {n*(t) 
Then [2] 


= (}. 
= 0} = Pr {t’ s #}, 


‘i Ee. &(s) _ Ee ™*° _ Eo". 
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= of ss — x) Bt, 2) +X) [| Ble — y,) dH) 
“0— 


+ r»()[1 — H(x)] BO). 
Hence 


at 


e(t, s) + {s ea A(u)[1 ae v(s)]}e%“ -u)—[1—(s)] [ACE “A)] B(y) du 


(3.1) 0 
= (s)e* OA Rs = 0. 


THEOREM 2. Under the same assumptions on \ and H as for Theorem 1, B(t) 
is the unique continuous solution of the Volterra equation of the second kind 


at 
(3.2) g(t) = Bit) + | K*(t, u)B(u) du, 
“0 
where 
’ l ad ite 
K* a) =. — PY. 
‘ 2mri dt I, ix 
‘fs — Aw _— ¥(s) ]e* ; pa urenne sou dS [a oe 


Proor. The proof proceeds as for Theorem 1, except that, before differentia- 
tion, the kernel now contains an additional term of the type 


l a ee 2—B|1—¥(2))} AS 
p*(a, 8) = — ro. [1 v(s) Je’ li iain — 
“71 /z—ix = 

l er+i2 a 5 . ls 

+ ctf — w(s)]le* — 1] — By(s)} > 


+ (8—1)e* | H(r) dr. 


“0 
This expression is treated in the same manner as p(a, 8) was treated. 

4. ¥(s) Regular at infinity. We shall briefly remark onthe practically important 
case when y(s) is regular at infinity (e.g. when ¥(s) is rational). This assumption 
regarding y is more restrictive than the assumptions in the hypotheses of Theo- 
rems 1 and 2, because by Pincherle’s Theorem ((4], pg. 263), 

-1 2 
¥(s) = as tas ° +-:--: 


is the Laplace transform of a density 


h(t) = z _= es) ds, i> 9, 
Jr 


“71 


where I is a contour, on and outside of which ¥(s) is regular. In particular, if 
a . : ae ee on 

y is a rectangle on which Rs < 6 > O, then we see that | A(t)| S Ke. Thus 

one may choose c = 0. 
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Instead of multiplying (2.1) (or (3.1)) by the “convergence factor” s’, it is 


sometimes more convenient to use (s + uw), u > 0. For example, the kernel 
G(t, u) of (2.4) then becomes 


( v/ 1 i —2 —u)e— (t)—A(u)] [l—le 
(4.1) Git, u) = -— | s(s a p) a [A(4)—A(u)] [1—-¥(0)] ds, 
“ml /T 


and K(t, u) = (d/dt)G(t, u). For instance, if ¥(s) = (1 + s)', and if we choose 
u = 1, then, if J, is the modified Bessel function, 


f 


— ened! — u)'*(A(t) — A(u))"?] — (¢ — u)"” 
(A(t) — A(u))??7,[2(¢ — u)'?(A(t) — A(u))"?), 
if A(t) ¥ A(u), 


Git, u) = { 


le “™ 1 — (¢ — w)], if AM) = A(u). 


This is rather similar to the kernel encountered by Clarke [3] by a completely 
different approach. 


5. Asymptotic behavior of n(t). Unless specifically stated, no restrictions 
regarding the distribution, or independence of the sequences {x}, xi 2 0, and 
{tn}, O << tt < t,---, shall be made in this section. Therefore, we cannot use 


the results of Sections 2—4+, but must return to the fundamental relation (1.1). 
LemMaA 5.1. 


at 
n(t) = cup | x(u) dN(u) — | 
z>0 %t—<2 


Proor. Let y = {max u|u S ft, n(u) = 0}. Then 


at 
n(t) = | x(u) dN(u) — (t — y). 
"- 


On the other hand, 


-t -t -t 


n(t) = nl(t — x) + | x(u) dN(u) — | L(n(u)) du = x(u) dN(u) — x. 


“t J ¢— “t- 


THEOREM 3. If N(t) = At + o(t)ast— ~, im Xi = an+ o(n),asn— ~, 
ha S 1, then n(t) = oft). 


Proor. We note first that the hypothesis implies that if 0 < yj S 
then 


~yt 
lim f' || x(u) dN(u) — ven | = Q, 


t+2 


uniformly with respect to y. Let 6, « > 0, be given. Then if 0 < x S (1 — 8)f, 
there exists a 7,3 , such that 


| f. x(u) dN(u) — | = f° | [ x(u) dN(u) — [ - x(u) an(w | 


“—o 


— (z/t) Sad — (1 —z/Jac+e—z/t S., 
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if ¢ > 7.,. On the other hand, if + > (1 — 4)t, there exists a 7, such that 


is | x(u) dN(u) — | 
“t—z 


t 


< elf x (u) an) | —v/tSac.t+e— (1 — 6) Se + 6, 


if t> T.. Hence n(t)/¢t S € + 6 if ¢ > max (T., T,,3). 

Coro.uary. If N(t) is a Poisson process with cumulative mass A(t) = At + o(t) 
as t—> ~, > ont xi = cn+o(n), as n— ~, ra S 1, then v(t) = o(t) with 
probability 1. 

Proor. If A(#) < «, the result is trivial, as then n(¢#) = O(1), with prob- 
ability one. Assume A(2) = «. Let N*(t, w), w € 2, be a homogeneous Poisson 
process with unit density. Then N(t, w) = N*(A(t), w) is a Poisson process with 
density X(t). Hence 


. Nt, w) . N*(A(d, w) ACD 
in —___— = | oe ec 


im —— a.4.0. 
om too A(t) t 


The following result follows from results of Kiefer and Wolfowitz [6], after 
some elementary transformations. 

THEOREM 4. Suppose A(t) = At + O(1), ast — ~. If {x;} are independent of 
each other and N(t), and are equidistributed, and if x;\ <1, x2 < ~, then 


En(t, + O) = O(1), 
asn— &%. 
The hypothesis on A(¢) is satisfied, e.g., if A(t) is periodic with mean AX. It 


may be shown, by counterexample, that the conclusion of Theorem 4 becomes 
false if the hypothesis is weakened to A(t) = At + o(t). 
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NOTES 


PROBABILITIES OF HYPOTHESES AND INFORMATION-STATISTICS 
IN SAMPLING FROM EXPONENTIAL-CLASS 
POPULATIONS 


By Morton KuppeERMAN 
The George Washington University 


1. Summary. This paper is concerned with inequalities connecting probabilities 
of hypotheses using Bayes’ theorem (a posteriori probabilities), a priori prob- 
abilities, and Kullback-Leibler information-statistics in sampling from popu- 
lations belonging to the exponential class of populations. As a corollary, it is 
shown that if it is known that the a priori probabilities are all equal, the choice 
of the hypothesis with the minimum Kullback-Leibler information-statistic is 
the same as the choice of the hypothesis with the maximum a posteriori prob- 
ability, and conversely. 


2. Introduction. Suppose that an event EF can occur only if one of the set of r 
exhaustive and incompatible (mutually exclusive) events H,, Hz,---, H, 
occurs. The a priori probabilities of these latter events (which we may call 
hypotheses) are denoted by a, a2,---, a, respectively, where a, > 0 and 
cll am = 1. The conditional probabilities for / to occur, assuming the 
occurrence of H,, , are denoted by p(FE | H,,), m = 1, 2,--- , r. The a posteriori 
probabilities of H,,, given that F has occurred, are denoted by p(H,, | FE) 
Bayes’ theorem (see, for example, Uspensky [16]) states that 


p(H,,|E) = Om PE | Hm) 


form = 1,2, --: ,F. 
>. a; p(E | H;) 
=| 
A discrete multivariate and multiparameter population will be said to belong 
to the exponential class of populations (cf. Blackwell and Girshick [1] and 
Girshick and Savage [5]) if its probability distribution can be represented by 


p(x, 6) = q(@) r(x) exp< Z s:(@) t,(x)>, 


where x is the row vector x = (7, %2,---: , 2%), 6 is the row vector 6 = (6;, 
62, --- , 0), g(@) and r(x) are nonnegative functions of 6 and x respectively, 
and the parameter space is assumed to be an open convex set in an h-dimensional 
Euclidean space. We have k variates and h parameters, with the number of 
products in the exponent of e being h. Examples of discrete populations of the 
exponential class are the binomial distribution with the single parameter p, 
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the Poisson distribution, the geometric distribution, the multinomial distribu- 
tion, and the multivariate Poisson distribution. 

Now consider r populations of the exponential class, each of the same func- 
tional form but differing only in their parameters. Let the probabilities be given 
by p(x, 0,.) > 0, where >> xp(x, Om) = 1 for m = 1, 2, --- , r. Suppose that we 
have a single random sample of N independent observations from one of these r 
populations (we do not know which population) and we wish to decide, on the 
basis of the sample values, which of the r populations is the most likely source 
of the sample. We shall use the term “‘most likely” in the sense of “having the 
largest a posteriori probability’? and we shall assume that the a priori prob- 
abilities a, are already known. 

Let E denote the random sample and let 6 = (6,, --- ,4,) denote the maxi- 
mum-likelihood estimate of 6. 


3. Inequalities. The information measure J(1:2) was introduced by Kullback 
and Leibler [12] as a generalization to the abstract case of a definition of infor- 
mation independently introduced in 1948 by Shannon [15] and by Wiener [17]. 
(See also Kullback [9], [10], and [11] for uses in statistics of J(1:2). 7(1:2) has 
recently been termed ‘“Kullback-Leibler information number” (Chernoff [3]) 
and “K-L information number’ (Bradt and Karlin [2]).) 

We obtain for two discrete populations of the exponential class 
p(x, 61) 


1(1:2) = > p(x, @) log & 
= p(x, 82) 


q( 82) i=1 


where the probabilities for the first population are given by p(x, 6,), the prob- 

abilities for the second population are given by p(x, @), and F; denotes ex- 

pected values with respect to the first population. The logarithms are natural 

logarithms. 

We now define the Kullback-Leibler information-statistic for a random 
sample of N independent observations from the mth population as 

N 7 p(x, 6) log p(x, = 

x p(X, 9») 


s2 A ( f ’ 
¥ (0 ‘ - ae 
N log 7 +N > |) — 8; (On)? - (: <t;(x) ) J 
q\ Om) im1L | ) \ ) 760 


In J(1:2), which is a functional of the vectors 6; and 6 only, 6; has been replaced 
by the maximum-likelihood estimate 6 and ® has been replaced by the set of 
parameters 6,, of the hypothetical mth population. The sum has been multiplied 
by N since the information measure for N independent observations is N times 
the information measure for a single observation. 

The Kullback-Leibler information-statistic for samples from discrete popula- 
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tions of the exponential class (as well as for samples from more general statistical 
populations, discrete or continuous, univariate or multivariate, uniparameter or 
multiparameter) has useful applications in mathematical statistics. If we set up a 
null hypothesis that the given sample of N independent observations was ran- 
domly drawn from the specified mth population, then it can be shown that 7 
as defined above is asymptotically distributed as chi-square with A degrees of 
freedom when the null hypothesis is true (Kupperman [13], [14]). 

We shall now show that the following inequality relationships exist connecting 
the a posteriori probabilities, the a priori probabilities, and the Kullback-Leibler 
information-statistics: 

THEOREM. For two discrete populations H,, and H,, of the exponential class we 
have p(H,, | E) = p(H, | E) if and only if 1, <= i.+ log (am/an) , with both rela- 
tions being equalities or strict inequalities simultaneously. 

Proor. From p(H,| E) = p(H,| FE) we obtain, using Bayes’ theorem and 
simplifying, 


N oA 7 
ig(0n)1"* exp 4— ¥ Yala) 4) 


p=l i=l 


f N h 
< [q(6,)]-* exp {—2o Do 8:(0,) 4s (K)¢ > =, 


\ j=1 i=—1 An 


where X; is the value of the jth observation on x, j = 1, 2,--- , N. Now it can 
be shown (Kupperman [14]) that for populations of this class we have identically 


(ence ) i OE: 
oe Nis 


\ J 4 


(The discrete populations of the exponential class now being considered belong 
to the class of distributions admitting sufficient estimates of the parameters 6; 
these are distributions of the Koopman-Pitman type.) Hence by multiplying 
both sides of the inequality by the positive quantity 


= . 
[q(6)}* exp4N > s,(6) (8 t;(x) ) ? 
\ ) Send, 


i=1 ib 
and taking logarithms, we obtain 


ie 


Since the steps of the proof are all reversible, the theorem is proved. The follow- 
ing corollary is an immediate consequence of this theorem: 

Coro.uuary. If the a priori probabilities are all equal, the choice of the hypothesis 
(or hypotheses) with the minimum Kullback-Leibler information-statistic is the 
same as the choice of the hypothesis (or hypotheses) with the maximum a posteriort 
probability, and conversely. 


4. Continuous exponential-class populations. Although the preceding two 
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sections are concerned with discrete populations, it may be remarked that the 
theorem and corollary may easily be extended to samples from continuous popu- 
lations of the exponential class (such as the univariate normal distribution, the 
chi-square distribution, and the multivariate normal distribution with k means 
and k(k + 1)/2 parameters in the variance-covariance matrix). We make use 
of Bayes’ theorem for continuous distributions (Kolmogorov [8], p. 46), use the 
likelihood of the observed sample instead of the probability of the observed 
sample, and follow the same steps as in the proof in the discrete case. The state- 
ments concerning (FE {t;(x)})g-g and the asymptotic distribution of 27,, remain 
valid for continuous as well as discrete distributions of the exponential class. 


5. Application. The theorem and the corollary are applicable to problems in 
which the a priori probabilities can be expressed in exact numerical form and 
thus the application of Bayes’ theorem is legitimate, as, for example, in Men- 
delian hypotheses (see David [4], Chapter VIII). 

In connection with the theorem and corollary, it may be remarked that the 
statements hold true if common logarithms (or logarithms to any base) are used 
in place of natural logarithms. This point is of importance, for in practical work 
common logarithms are more frequently used. However, in connection with 
the approximation of the large-sample distribution of 27 by a chi-square dis- 
tribution, it is important that natural logarithms be used, or that if common 
logarithms have been used 27 be multiplied by log, 16, or 2.30259 approximately. 

In conclusion, it may be remarked that if we were to use the corollary and 
decide always to accept the hypothesis for which 7 is the minimum without 


regard to the a priori probabilities involved, then we are in effect tacitly assum- 
ing that the a priori probabilities are equal, which is Bayes’ postulate (as dis- 
tinguished from Bayes’ theorem). 


The connection between information theory and inverse probability has been 
noted by Good [7], who is also concerned with the terminology and notation of 
information theory, particularly as it is applicable to communication theory. 
Reference should also be made to Good [6] for an informative discussion on 
Bayes’ theorem and inverse probability. 


6. Acknowledgment. The author wishes to thank Professor S. Kullback and 
the referees for suggesting the generalization incorporated in the results of this 
paper, which were derived originally for the special case of multinomial sampling. 
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ON THE DISTRIBUTION OF 2 kK 2 RANDOM NORMAL 
DETERMINANTS' 


By W. L. NicHoLson 


Princeton Universiti 


Es Summary. The ¢.d.f. of a 2 * 2 random determinant with mutually li) 


dependent normally distributed entries is derived as an infinite series. Error 
functions that bound the tail of this series facilitate numerical calculation. Con- 
ditions are imposed on four variable quadratic forms for this distribution to 
apply. \ normal approximation to the distribution is suggested. 


2. Introduction. Let VY,. Ne, XN; and NX, be mutually independent random 
variables, each normally distributed, with means 4; , we, ws; and wy, and com- 
mon variance ao. Let D be the random determinant, 

X, Xe 8 ae eas 
Dei. | = Ai: 2c — Ag ds. 
X3X;4 " 
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If all the uw; vanish the p.d.f. of D/o° is easily calculated to be the Laplace dis- 
tribution [5], 


1 1 
4 exp {—3/2]}. 


When the u; are not zero the distribution is, in general, skewed and not expressible 
in a simple closed form. 

Craig [1] derived the p.d.f. of the product of two normal variables (not neces- 
sarily independent) as an infinite series of Bessel functions. Theoretically, his 
result plus the convolution formula for density functions determines the p.d_f. 
of D. However, the form of such an answer is not particularly adapted to nu- 
merical work. Most methods for handling the distribution problems connected 
with normal variable quadratic forms are not applicable here. The reasons for 
this are, first, that D is not a definite form, and, second, that it cannot be repre 
sented as a linear combination of central Chi-Square variables. The former 
obstacle can be overcome to a measured degree by several different procedures; 
e.g., Pitman’s and Robbins’ method of mixtures [6] and Gurland’s Laguerrian 
expansions [3]. The latter causes more difficulty. There does not seem to be an 
adequate technique available to handle linear combinations of non-central Chi- 
Square variables. 

Our approach is basically a brute force method consisting of straightforward 
inversion of the characteristic function of D. The independence and homoscedas- 
ticity assumptions cannot be relaxed without greatly complicating this inversion 
problem. In the process a single integration leads to the c.d.f. of D. Percentage 
points are immediately available without resorting to quadratures. 


In the sequel o = 1. There is no loss of generality in this simplification, since 
o appears as a seale parameter in the distribution of D; i.e., we derive the dis 
tribution of the normalized variable D/o’. 

The characteristic function of D is easily calculated to be 


{ 


17 .stD aie —Af + 2iAt 
(2.1) op(t) = Ke PY (1 +f) exp + + 21 . 


2(11+ &) 
where 
= wi + ye + ws t+ ws, 


Mi Me A A 
A= = 1 Me — Mops, 5 2485. 
Ms M4 é é 


Thus, we see that the distribution of D depends on the means only in the form 
\ and A. Expand log ¢p(¢) in powers of ¢ to get the semi-invariants of D as 


oe 
- ew:(4 2) 


Mn+ (2k + 1)!A 
lhe mean and variance of D are 


(2.4) 
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The coefficients of skewness and excess are 








... 6A 
nm” Qi? ~ (A + 2)’ 
(2.5) 
a 12(A+ 1) 
v2 — a — - 


2 
ae 


The distribution is skewed if and only if A # 0. From (2.5) this skewness is never 
great. In fact |y:| S 2\/6/3 and y; = 0(A"”) for large A. The excess 2 is mono- 
tone decreasing in A with a maximum value of three for A = 0. Also, yz = O(A) 
for large A. Thus, if at least one yu; is large the distribution is almost symmetric 
and of approximately the same peakedness as that of the normal. In Section 4 
we show that D (appropriately normalized) is approximately normally dis- 
tributed for large A. 


3. Exact Distribution. The functional form of the characteristic function (2.1) 
indicates that the p.d-f., fs. , and the c.df., Fs. , of D satisfy for all real z 


(3.1) fa.s(z) = fa.—s(—72), F4.4(2) =l1- F,4,-s(—2). 


Hence we need only consider the distribution of D for negative argument. In 
the remainder of the paper z always satisfies x < 0. The c.d.f. of D is not ex- 
pressible in a simple closed form (unless A = A/2.). Introduction of an appro- 
priate error term does make it possible to represent it as a damped polynomial 
in |x| with coefficients that are elementary functions of A and A. Let R be any 
set of non-negative integers. The c.d.f. of D can be written as (see Sec. 5) 


(3.2) Fy.s(z) = >> > hr, Dolr, t | A, A/2,| x |) +L, 
t=0 


reR 


where L satisfies 


(3) 
-aj2 \2 
si<s e ——— a2 9, 
2 ven rt 
(3.3) 
Aaely 
] —A/2 2 
OsL<- é Se A<0 
2 eR r 


The auxiliary functions h and g are defined by 


> (or —1+1\(1\2r —1 41, 


t if, wer} pnd 
enter = s (b a) c oa 

j=0 JC r — g)\(t — 9)! 
Here h(r, #) is just the probability of not more than r — ¢ heads on 2r — ¢ +1 
flips of an unbiased coin. Several tables of h are available; e.g., [7]. The function 


h(r, t) 


(3.4) 
g(r, t | a, b, c) 
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g satisfies a number of recursion formulae. The most useful of these for computa- 
tional purposes is 

(3.5) tg(r,t| a,b,c) = ag(r — 1,4 — 1] a,b,c) + cg(r,t — 1] a,b,c). 

The boundary condition, 


ad i = (b — a)” 
(3.6 g(r, O a, b, Cc) m= ¢ . < ¢ - =? 


\ Pr: 


’ 


and (3.5) provide a rapid method of generating a matrix of g values for any 
triple (a, b,c) = (A, A/2,| x |). The right side of (3.6) is most easily calculated 
as the product of two tabular values. The bracket term as a Poisson density 
is tabled (see [4]). 

The bound (3.3) on LZ is quite good for A = 0. Numerical checks show, for 
example, that for values of A of the order of ten a bound of 0.01 on L adds 
only one integer to R over and above that necessary to give an error S 0.01. 
For 4 < 0 the bound is admittedly rather poor and certainly could be im- 
proved. 

To minimize the calculation necessary to evaluate F’'4.4(x) the set R should 
contain as few elements as possible. To accomplish this and still maintain a 
specified bound on the error R should consist only of integers in an appropriate 
interval including A/2 (at least when A = 0). However, from the standpoint 
of iterated computation of the g function the optimum R set is {0, 1, 2, ---, M} 
for suitable M. In this case the bound (3.3) on Z is, except possibly for an ex- 
ponential factor, the tail area of a Poisson distribution; its value can be read 
directly from tables [4]. 

At least three values of A lead to extreme simplification in the formula (3.2) 


These values are the maximum and the minimum A value for fixed A, and 


A = 0. The simplified forms make possible several quickly computed bounds 
on F, .3(x). The simplifications are 
' 27S" — 1 {ax (A/2)’ 
, , —(A/2)—Iz o “ 4 é 
4 AT) = enna as aia + ° 
I A,—A/2\2 ¢ 2 ~ Or—t+1 t! -! ; L; 
reR t=0 - . r 3 
(27 r 6 (4 /9Q)" 
) + ¢ —(A/2)—|z] t A/2) 
Fyo(x) =e 2, 2, 0, } —- ——— + L, 
reR t=0 t! me 
’ —(A —|z 
Faap(x) = 3 . 


The bound (3.3) on L (with A = O) is also applicable for the first two lines of 
(3.7). Since Fy ,s(2) for fixed A and x is a monotone decreasing function of 


A(—A,/2 S A S A/2), the following inequalities are immediately available. 
Fy -an(x) 2 Faa(x) 2 Fao(x) AS 0, 


Fao(x) 2 Faa(x) 2 Fa.aj2(x) A = . 


A simple but interesting application of (3.8) is the following bound on the prob- 
ability that D is negative. 
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he ““ <= Pr{(D < 0} <} A 


4/4 
<= Pr{D s 0} <1 — he A 


IA 
IIA 
IIA 


0 


IV 


(3.9) 


IIA 


These are the best bounds possible that are functions of A only. 

The quadratic form D is by no means representative of the class of all such 
forms in four variables. In general, the distribution of a four variable form is 
more complicated than that of D. There are certain special cases, however, 
when the distribution function (3.2) does apply. The following theorem gives 
a necessary and sufficient condition on four normal variables and on the ac- 
companying form for the distribution to be that of this paper. 

Tuerorem. Let X = (X,, X.,X3, X4) be a random vector distributed by a mul- 
tivariate normal law with mean vector » and covariance matrix >. Let M be a4 X 
4 symmetric matrix. The quadratic form XMX‘/2d is distributed according to 
the law (3.2) if and only if the eigenvalues of the matrix MX are d, d, —d and 
—d. If such is the case A = p> yw’ and A = wMy'/2d. 

A proof is easily constructed by identifying the characteristic function of 


XMX'/2d with (2.1). 


4. Normal Approximation. Let D = D/(2 + A)*. Then, if A increases with- 
out bound in such a manner that A/(2 + A)! — a, we have from (2.1) that 
o2(t) — exp (iat — f'/2). So, by the continuity theorem for characteristic func- 
tions [2], for large A D is approximately normal with mean A/(2 + <A)’ and 
unit variance. The question of how large A must be for the approximation to 
render reasonable accuracy is quite difficult to answer. The following remarks 
are offered to give some insight into this problem. Clearly, the rapidity of the 
convergence depends upon A and | z | in some fashion. For A = 0 the approxi- 
mation is very good since this is the symmetric case. With A fixed the accuracy 
decreases as |x! increases. Numerical checks indicate that for |z| less than 
three and A about 20 the relative error in the c.d.f.is less than 5% . For the general 
case with A not too far different from zero the accuracy seems to be roughly a 
monotone decreasing function of |x| + A for fixed A. With |x| + A less than 
four and A about 20 the relative error is less than 7%. For large numerical 
values of A the approximation is extremely poor. For example, if A = O(A) 
and if | 2 | > 0, then the relative error approaches 100% as A increases. 


5. Derivation of Exact Distribution of D. Since ¢p is Lebesque integrable the 
Lévy inversion formula [2] gives the p.d.f. of D as 





1 +c _ oe r) A*(A/2)* 
( 8 eu | szt t It a Se Ala) 
01 Sa.a 2) 2. e dp(t) C 9 2 TG : BI 
9.1) aa : 
arin Q(z, z)\* - 
Here, 
Te —1zt 
(5.2) Q(z, z) = ~f[. =z ad = er’ Ws 


forz < Oandz> 0. 
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Evaluate the jth partial derivative of Q with respect to x first and integrate 
fa.a(x) over the interval (— «, x). Then (5.1) becomes 


—A/2 2 ar 5 
ea 1 0 m= A a al 
(53) Fyra(z) =o —_ 1\y 0 (a2 a ‘) ite | L. 
— peat 2 2 (4) 02" E 2 . | cen + 


where FR is any set of non-negative integers and Z consists of the remainder 
of the series; i.e., those terms such that rg R (here r = j + &k). 

Rewrite the factor (Az’” — A/2)' as an rth partial derivative of the appro- 
priate exponential function of W evaluated at zero. Use Leibnitz’s Rule for 
differentiating a product to compute the rth partial with respect to z of the 
resulting function after choosing one product factor as 2”. Employ the iden- 
tity, 


af Pp . 1 

} —_— - 2P — i\ (2 
(5.4) - ce = (—1)’Ple 7 > a 
OzP 1 P a! 


z=1 i=0 


with a = |x| — AW, and complete the differentiation with respect to W. Con- 
siderable algebraic simplification involving routine summing of finite combina- 
torial type series gives the form (3.2) after the appropriate identifications have 
been made with the A and g functions defined by (3.4). The bounds (3.3) and the 
simplifications (3.7) result from straightforward algebra. Details are omitted. 


6. Acknowledgement. The :uthor expresses his appreciation to Dr. L. Marcus 
whose interest in this problem gave impetus for the paper. Thanks are also due 
to Professor J. W. Tukey for several helpful suggestions. 
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A SMOOTH INVERTIBILITY THEOREM! 
By Joun W. TuKEy 
Princeton University 


1. Introduction. I: connection with discussions of fiducial inference (e.g. 
see 3), it is often desirable to consider the invertibility of certain mappings 
We'shall say that a mapping is smoothly invertible (of class a) if (condition 
(1) of [3] is irrelevant here): 

(2) the mapping is 1-1 and hence has a single-valued inverse, 

3) this inverse is a continuous function (and has continuous derivatives of 

all orders up to @). 
All too often, as has been emphasized to the author by L. J. Savage, the question 
of invertibility has been ‘‘answered”’ by showing that a Jacobian is of constant 
sign. It is, of course, well known that this does not suffice to give uniqueness in 
the large 

Explicit conditions sufficient for uniqueness in the large do not seem to be 
given frequently in the literature. The present note records an explicit theorem 
in a form which seems likely to be of service in such conditions. 


2. A smooth invertibility theorem. We now state the smooth invertibility 
theorem as follows: 

Any a times continuously differentiable mapping from an arcwise connected 
open domain (in n dimensions) to a simply connected range, whose Jacobian deter- 
minant is continuous and of one sign throughout the domain, and whose inverse 
carries compact sets into compact sets is smoothly invertible of order a. 

In our application it is convenient to use the 

Observation. Jf the open domain and the simply connected range are both 
the whole plane (or the whole of any Euclidian space) then the inverse will carry 
compact sets into compact sets provided that it carries bounded sets into bounded 
sets. 

The prool of this observation follows immediately from the remarks that (1) 
the inverse image of closed sets by a continuous mapping are always closed, (ii) 
in the whole plane the compact sets are just those which are closed and bounded 


3. Proof. The proof of the theorem rests on a classical result about local in 
version (which makes no use of arewise connectedness, simple connectedness, or 
the hypothesis about the inverse taking compact sets into compact sets) and a 
purely topological result relating local uniqueness of inverses to global unique 
ness (which makes no explicit use of the differentiability conditions) 

We say that N is a local inverse neighborhood of x if 

1) f(N) isa neighborhoc dd of y = f(x), 
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(2) there is a choice gy(y’) defined for 7 in f(N) such that f(g,(y’)) = y’ 
for any y’ in f(N). 
If there is but one inverse image 2’ in N for each y’ in f(N), N is a unique local 
inverse neighborhood. If every « has a unique local inverse neighborhood, we 
say that y = f(x) has unique local inverses. 

The classical result (given for example in [1] on pp. 257-258 of Vol. 2) asserts 
that under our differentiability and Jacobian hypothesis, y = f(x) has unique 
local inverses, and that, if we restrict the neighborhoods sufficiently, these in- 
verses are a times continuously differentiable. 

All that is lacking is the knowledge that these local inverses are unique, not 
only when we must go back to a neighborhood N of a particular solution, x of 
y = f(r), but when we are free to go back to any x. This will follow from a 
special case of the covering homotopy theorem, which will be derived from a 
more usual form in the next section, namely: /f y = f(x) ts continuous and has 
unique local inverses, if X is a compact metric space, if x, is a continuous image 
of the unit intervalO S t S 1 in X, tf ye is a continuous image of the unit square 
O0O<t,s < Lin f(X), and if yin = f(x.), then it is possible to define a continu- 
ous image x,,. of the unit square in X so that 


(1) 
(2) fi Xt, Ye 


(and indeed this can be done in at most one way.) 

Using this result we can complete the proof of the smooth invertibility theo- 
rem as follows: Let xy and 21 be any two solutions (possibly coincident) of yo 
f(x). Since X is arewise connected, we may join x9 to x, by an are x, for 0 S 
ts 1. Let ya = f(ar,). Since yo. Yi. = Yo the image of this arc is a closed 
curve (in f(X)). Since f(X) is simply connected, this curve can be shrunk to 
the point yo , keeping its ends at yo ; that is to say, we can define y,,, for 0 S 
f, s < 1 as a continuous extension of y;., with y;¥ y Yi.s i. Let H 
be the set of all y of the form y,,, for 0 S ft, s < 1. As the continuous image 
of a compact space this will be compact and will hence be closed in f(X). Let 
(; be the set of x for which y f(x) lies in H. Because of our hypothesis, G 


will also be compact. Surely G contains x, for 0 S ¢ <= 1. 


Now apply the topological result to G. We have then a continuous image 
x, Of 0 < ¢,s S 1 which satisfies 2) = 10, 2%. = 21, and f(2;0) f(xo0,«) 


7 X1«2) Yo. 


The images of the three sides s = 0, / = 0, and ¢ = 1 of the unit square thus 
provide an are leading from zy to x, every point of which maps into Yo. The 
local uniqueness of inverses now ensures that this are is a constant mapping, 
and hence that, in particular z9 = 2,;. Thus the solution of yo = f(x) is shown 
to be unique, and the proof of the smooth invertibility theorem is concluded. 

There is some interest in the need for all the topological hypotheses of this 
theorem, so examples are given in Section 5 to show that no one of these three 
hypotheses can be removed without the conclusion failing. 
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4. Proof of topological result. The textbook of Seifert and Threlfall [2] gives, 
as Satz I and Satz IL (pp. 186-188), theorems from which we can immediately 
derive a close analog of the result we used. The differences will be as follows: 
(i) the result wil be restricted to a topological complex (which would suffice, 
since we Wish to apply it only to n-dimensional open domains), (ii) the condition 
that the inverse image of compact sets is compact would be absent, (iii) the 
condition of unique local inverses would be strengthened to local homeomor- 
phism. We need then only show that our extra condition implies local homeo- 
morphism. 


Let y be a point of f(X). Let 7, , a2, --- be its inverse images. Let N, , 
Nz, -::, Ny be corresponding unique local inverse neighborhoods. Let K be 
the part of XY not in any of these N; . Consider L, the complement of the closure 
of f(K). If y is in L, then the intersection of L with all f(N;) is a neighborhood 
of y each of whose points has exactly k inverse images, one each in N,, No, 

-, Ny, and since these local inverses are continuous, we have the desired 
local homeomorphism. 

If y is not in L, then it is in f(K), the closure of f(A), and there is a sequence 


J; in f(K which converges to 7. Let Y ,-+:. Then ¥ is compact. 


----40 


nn 
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STRIP AFTER LOOPING 


Fic. 1. Second step in the mapping 
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Let w, in K satisfy f(v;) = y;. Then xv; is in the inverse image of Y which is 
compact. Hence we can extract a subsequence z;, which converges to some ro . 
Clearly f(ro) = lim f(z;,) = lim y;, y. Hence 2» is in some N;, and hence 
not in K, which is a contradiction. The desired result is thus proved 


5. Remarks and counter examples. In this section we show that no one of the 
three topological hypotheses can be omitted. In each case we show that unique- 
ness of the inverse is immediately lost. For the first two hypotheses the example 


can be very simple and one-dimensional. For the third a two-dimensional, not 
too-simple example is provided. 

If we drop the connectedness requirement for XY, then we may take X as two 
non-intersecting straight lines and f(x) as a rigid application of each on a third 
Uniqueness is lost. (Consideration shows, indeed, that e-connectedness for all 
e > 0 would suffice.) 

If we drop the simple connectedness of f(X), then we may take X as a circle, 
f(X) as a circle of half that diameter, and the mapping f(X), as the winding of an 
inextensible loop of string around the smaller circle. See Fig. 1. (We cannot use 
the whole infinite line without having the inverse image of a compact set become 
non-compact. ) 

In the third example we drop the requirement on compactness of inverse 
images. Here the example is a mapping of the plane into part of itself which is 
most easily described qualitatively and geometrically. We begin by deforming 
the plane into a long open strip, which can clearly be done with a positive, non- 
zero Jacobian. We now consider a transformation of the strip into a simply 
looped strip in which uniqueness of inverse has been lost but simple connected- 
ness of range has not been achieved. Graphically, corresponding points appear 
as in Fig. 1. The failure of simple connectivity is due to the small circular disk 
bounded by the image of the are CEG. If we loop the strip again in the same way, 
we can use the new loop to cover this hole, meanwhile placing the new hole on 
the old loop. The resulting transformation will have a simply connected range, 
but the inverses will not be unique. This is the desired third example 
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ON THE INVERSION OF THE SAMPLE COVARIANCE MATRIX 
IN A STATIONARY AUTOREGRESSIVE PROCESS' 


By M. M. Srppieur* 
University of North Carolina 


Let 2; ,-+::*, ty be the observations on a variate at times / = 1, --- , N. It 


is assumed that the underlying model is an autoregressive scheme of order / 


(1 Aol, + Ayr, 


mis 


where 2’s are independent N(0, 1) variates and the roots of the equation 

j~ ayy 
lie inside the unit circle = | in the complex plane. The variate z; is, then 
independent of 2:1 , 2:2, --+ ({2], p. 38). It is further assumed that the process 
is stationary so that FE z,, E x.t14;,7 = 0, 1, 2, --- are independent of ¢. Writ- 


J 
? 9 


: a ° . . 2 » 9 
ing oz for the variance of any x, we observe that since EF r, = 0, 4; E 2; 
We define autocorrelation between x, and x, by 


2 sie Y 2,2,/0 


so that y, satisfies Eq. (1) with z, replaced by zero and y_, vt 
Let X; stand for the column vector of the first 7 observations and X; for its 
transpose, 1.€., 


(3 ae ty, +++ 00. 2 «++ Be. 


Also write A; for the covariance matrix of the vector X 


for j bey , N. We note here that the matrix A; is persymmetric, 1.e. 
symmetric about both the diagonals. This property will be used to obtain Ay' 
The distribution of Xy is given by 


5 dF(Xx (24) **\Ayl? exp [— MX,4e Xu)) dX. 


1. Wise [1] has given a method of finding Ay using the spectral density func- 
tion. We propose here another method of obtaining Ay’ based on the symmetric 
property of A, 
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The distribution of x, , +--+, , 241, °°* , Zw 18S given by 
dF (x, 9° 5 Das Shady *** 5 Spe) 
ar 
= (24)-*"| Ai|'” exp | -3{xiar'x, + i. zi | dX, dzxy1 +++ dey. 
We shall assume here that N > 2k. Considering (1) as a transformation from 
z,to2,fort = k + 1,---,N, we obtain the distribution of Xy as 
dF (Xu) = (2n)-*" al | A, |” 


2% 


(6 ( f N k 2 
” - exp | -3{xiar'x, + > (= Aix, : | dXy. 


2 \ t=k+1 ) 


Comparing (5) and (6) we have 
= 2N 2k ' 
(7) ag | Ay , = & | Ay | 
and 
N k 2 
>! —lxy >! —l x 
(8) Xv An an = XA; X. + - (> air.) ° 
t=k+1 i=0 
’ r , ° ° anf s . 
Let Cy be the N X N matrix which has A; in the upper right-hand corner 
and zeroes elsewhere, i.e., 


A; @ 
(9) Cy = 
0 O 


and By be the matrix of the quadratic form in the second term on the right of 
Eq. (8), ie., 


(10) _ ( — ) «Tae... 


t=k+1 


so that we have 
(11) Ay = By + Cy. 

Denoting by a,;, b;;, and ¢;;, ¢, 7 = 1, 2,---, N, the elements in the tth 
row and jth column of the matrices Ay’, By , and Cy respectively, we have 
(12) ee 
But c;; = 0 if either 7 or j > k. Hence 
(13) a b,, if either 7 org > fF. 


Now By is completely known. In fact, assuming j 2 7, 
{ k+i 
for j <k, 
for 2 
(14) by = 6b, = 


for k 


for? 2 N—k+ 
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Thus all the a;; , except those for which both 7 and j are less than or equal to 
+ ° ° ° ° ang 
k, are known. Now, since Ay is persymmetric, so is Ay . Therefore 


(15) Aji = Oyj = An—i41,N-f+1 » t,3 ad i 2. nee N. 
Using (13) and remembering that N > 2k, we have 
(16) Q5; = 5; = bw—s4i.w—s41 fort,j = 1,2,--- ,k. 


Thus Ay’ is completely determined. We now use relations (12) to obtain all 
the elements of Az’. Once Aj’ is known, we can find Ay’ for any N = k using 
(6). 

If k is less than 5, we can directly compute A; and then use Eq. (6) to ob- 
tain Aj’. 

Illustration. Let k = 2. The distribution of Xy is 


ae ley _\—-N/2_N—2 -1/2 
dF(Xx) = (2m) ao Ag 


‘ exp | -}{x: A;'X: + : (ao, + ai121-1 + az.) | dXy, 
- t=3 


so that 


ay 1; d2 1 A2 0 2 0 0 
A; As a; + a QoQ; + a; a2 Ao M2 see 0 
2 2 2 
Ad, A,+0,02 At+aj+a, a+ am :-:: 0 0 
By OF DF 6 aoe Olb 6 a) ake Ble eo OE MOOS Bad y Dos ol O.8 eB Te EHOW OOM 
0 0 0 0 QA +a; aa 
0 0 0 0 A a, ao 
Hence 
a; Ap a, Ap A2 0 tae 0 0 
QoQ, as + aj Ao A, + 4,2 do A2 .. 0 0 
° 0 9 
Qo. AQ; +402 Atata Hat+am -::: 0 0 
—1 
i oa : ole ahem Bas (cree ane eee ew Oe ee eee 4 ’ 
2 2 
0 0 0 0 ‘++ GQotaQ; aoa; 
0 0 0 0 see Ao A ao 
; aa — a> aoa, — a;Q2 
1, = ® » 
aoa, — a,Q2 ao — ae 


9.9 9 


2 0.9 29 
Ao | = (ap — G2) -- (Qo — Ge) a, 
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2N tr 2 
ag Ay Ag|\ Apo 

It may be mentioned here that Ulf Gernander and Murray Rosenblatt ([2], 
‘as N tends to in- 


finity. They, however, do not attempt to determine the /&° elements standing 


pp. 238-239) have considered asymptotic properties of Ay 


in the first & rows and the first k columns of Ay, although they sug- 


gest a method of orthogonalization of the vector X, 
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A PROBLEM OF BERKSON, AND MINIMUM VARIANCE 
ORDERLY ESTIMATORS! 


By Joun W. TUKEY 
P? neeton Un ersily 


i. Summary. The distinction between efficien Vv in the asymptotic sense origi- 
nally introduced by Fisher ([2], 1925, p. 703), and the finite sample sense some- 
times used by others has been recently stressed by various writers (e.g., Berkso1 
1] . The technique of proof used below was origina Vy ae vel yped to provide a 
simple example where the maximum likelihood estimate of location, though 


asymptotically efficient, was not of minimum variance for any finite sample 


i 
} 
{ 


I 
le exponental distribution with known 


size whatever. The (symmetrical) dou 
scale, where the sample median is the maximum likelihood estimator of location, 
could easily be shown to be such an example. (While this result is useful in de- 
flating unwarranted views about minimum variance properties of maximum 
likelihood estimates, Fisher’s ({2], p. 716) results about intrinsic accuracy in the 
same situation are of more basic interest. 

On examination, however, the technique used to provide this rather isolated 
and special result was found capable of showing, for a class of distributions with 
suitable monotony properties (in particular all distributions for which f’(y)/f(y 
is monotone decreasing, and all normal, exponential, gamma and beta distribu- 
tions), that the covariances of the order statistics in a s imple of any ¢ hosen size 


ire monotone in either index separately. 
Received March 22, 1957 
1 Prepared in connection with research sponsored by the Office of Naval Research 
ised, in part, on Memorandum Report 11 of the Statistical Research Group, Princeton 


University, which was issued 25 October 1948 
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2. Complete regression. We shall say the distribution of z given y shows 


ymplete negative regression on y if the cumulative distribution F(z\|y) satisfies 


< F(z y’) for 4” < y’, 


provided the equality does not always hold. We define complete positive re- 
gression analogously. We notice that: 
(A,) If the distribution of z given y shows complete negative regression y, and 

z, 1s an order statistic from a random sample of z’s, then the distribution 
of z, given y shows complete negative regression on y. 
If the distribution of w given z shows complete negative regression on 
and the distribution of z given y shows complete positive regression on x, 
and the distribution of w given z is unaffected by giving y, then the dis- 


tribution of w given y shows complete negative regression on y. 


(A If the distrib ition of > qiven Y shows complete neqalive regre ssion on ¥ 
then cov {z, y} < 0. 
The first result follows from the beta-function formula for an order statistic 
cumulative, 
G, alw) = 
which follows from the interpretation of G,;;,(w) as the probability of k or more 
out of nm falling in an interval of probability F(w), and which shows that the 
‘umulative of z,;, is a monotone function of the cumulative of z. The second 
follows easily by introducing the monotone repre senting funetion [3] Zy(u,; corre- 
sponding to F(z| y) such that if u is uniformly distributed on [0, 1], then z,(u) 
is distributed according to F(z|y). The hypothesis of complete positive re- 
gression is equivalent to z,(u) = zy(u) for y’ = y”, atid we have 


H(w = | G(w 2, (wz,-(u)) du = H(w| y” for y’ S y’” 


which we desired to show. The third result follows from the fact that 


which follows directly from the inequality for the representing functions. 


3. Subexponential distributions. We shall say that a cumulative distribution 


, P. ce 
ibe rpone ntial to the right i 


1— Flz+h 

1 — F(h) 

is monotonically decreasing for fixed z > 0 as A increases. We notice that this 
is equivalent to stating that, refe rred to the point of truncation, the distribution 
of z after truncation on the left shows complete negative regression on the 
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point of truncation. We define subexponential on the left, or in both directions, 
analogously. We are now prepared to demonstrate: 
(Bi) If F(q) = F(y — 86) is subexponential on the right, and y, S y2 
< yn are anordered sample of y’s(andqa S mm S < 
an ordered sample of q’s), then for any j < k we have 


Qn are 


cov {yx — ys, ys} = cov {ae — Qi, qi} < 0. 


’ 


(The analogs for ‘‘on the left”? or “in both directions” clearly follow by sym- 
metry.) The proof rests on Wald’s principle ({4], p. 536) according to which the 
distribution of q given q; is that of the (k — j — 1)st order statistic from a 
sample of n — j from the result of truncating the original distribution at q, . 
The distribution of gq, — q,; for q; fixed is that of a similar order statistic from 
the truncated distribution referred to its point of truncation as origin—and as 
remarked above this latter distribution shows complete negative regression on 
q; - By (A,) the same is true of any distribution of an order statistic, and hence 
for the distribution of gq — q;. The negativity of the covariance follows from 
(A3). 
This result (and its analogs) can easily be extended to 
(B,) Under the hypotheses of (B,), if h S 7 < k, then 


cov {yx — ys, ya} = cov {qe — G3, qr} < 90. 


For, since the distribution of q given q; is not affected by giving q,, and the 
distribution of g; given q, shows complete positive regression on q,, We may 
apply (Ay) to complete the proof. 
As a corollary we have the curiously simple results: 
(B;) Jf the distribution of q is subexponential in both directions, then the 
covariance of any two order statistics is less than the variance of either. 
(B,) If the distribution of q is subexponential in both directions, the covari- 
ance between order statistics q;, q. is monotone in j and k separately, 
decreasing as j and k separate from one another. 
The interest of these results is enhanced when we observe normal, expo- 
nential, gamma and beta distributions, pristine or truncated, are all subex- 
ponential in both directions. 


4. Monotone location-scores. ly definition, a distribution (gq) is subexponen- 
tial to the right if 


F(y + h) — Fh) 


1 — Fly + h) 


Ga h) SS eee = |] —— aiieneeadiaicaiaas 
, 1 — Fh) 


1 Fh) 

is monotone decreasing as h increases for every y. This is equivalent to 
log {1 — F(y + h)} — log {1 — F(h)} 

being monotone increasing, or, granting differentiability, to 


fh) _ fly +h) 


1—Fih) 1—Fyt+h) 
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where y > 0, and hence to 


0 
f'(u) du 
(2 d 
rey 7 Ile 1 - Feo) = = — 
/ flu) du 
Ju 
being monotone decreasing. This will follow from the monotone decreasing 
character of (log f(u))’ = f’(u)/f(u) since f(u) = 0. It is thus sufficient, but 
not necessary, for subexponentiality on the right that f’(u)/f(u), which is the 
location score associated with the specification consisting of all translations of 
F(u), be monotone decreasing. 
The class of distributions with monotone scores for location is immediately 
seen to be closed under formal multiplication of densities, so that if 
(i) F(u), G(u) have monotone scores for location, 


(ii) F(u) = | $00) au, Gu) = | o(w) du, 


(iii) H (1) (constant) f(u)g(u) du, 
then H(u) also has a monotone score for location. The class is also closed under 
truncation at one or both ends. It is immediately seen to include all distributions 
whose shapes are single exponential, double exponential (balanced or not), 
normal, (incomplete) gamma, (incomplete) beta, and their formal products 
and truncations. (It does not include distributions of Cauchy shape.) 

Since the large-sample optimum weight to be assigned to an order statistic 
is the negative of the derivative of the score for location at the typical point 
of distribution, it seems both peculiarly appropriate and highly reasonable 
that the minimum variance orderly estimator of location will actually have 
all its coefficients positive for any distribution with monotone score for loca- 
tion. 


- . ; < “- . 7 
5. Orderly estimates. We now turn to a location specification F(y | @ 
F(y — 6) and to orderly estimates of 6, by which we mean linear combinations 
of order statistics of total weight i. 468., 


y= z Wii + C, > w, = 1. 
(Notice that the variance and bias of 7 as an estimator of @ are exactly the 
variance and average value of 7, where 7 = >. w.q; are order statistics in a 
sample from F(q).) We begin with a general result, applicable to any convex 
(i.e., closed under at’ + (1 — a)t” for0 < a < 1) class of estimates of « which 
contains all order statistics (and thus surely contains all orderly estimates). 
(C) If t is the minimum variance estimate in any convex class containing the 


order statistics, and y, ts any order statistic (or any linear combination of 
order statistics of total weight one) 


cov {fy — t,t} = 0. 


This follows easily by considering the variance of ¢ + A(y. — ¢), where, in par- 
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ticular, if cov {y, — t, t} < 0, a value of X > O will provide lesser variance 
than for \ = 0. 

From the result it is easy to show that: 

(D) If F(q) ts subexponential to the right, then no single order statistic, ex- 
cept possibly the righthandmost, is of minimum variance among orderly 
estimates of location. 

(Again, the analogs with “‘to the left ... the lefthandmost” or “in both direc- 
tions ... statistic,” follow by symmetry.) For if y; were of minimum variance, 
and y, the righthandmost, then by 


(B,) cov (Yn — Ys, Yi) = COV (Qn— 93,95) < 0, 


and by (C) y; is not of minimum variance. It is reasonable to anticipate that, 
actually, all coefficients must be positive (particularly for distributions with 
monotone scores), but the elementary methods used here do not seem to show 
this easily. 
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AN ELEMENTARY THEOREM CONCERNING STATIONARY 
ERGODIC PROCESSES 


By Lro BreIMAN 
University of California, Berkeley 

1. Introduction. The purpose of this note is to state and prove a theorem 
concerning strictly stationary, ergodic processes and to give some of its applica- 
tions. Although the theorem itself is a simple consequence of the ergodic theorem, 
its applications include a proof of the consistency of the maximum likelihood 
estimates for stationary distributions and an extension of the zero-one law for 
symmetric sets given by Hewitt and Savage [1]. 


THEOREM 1. Let --- #1, 2%, %1, °°: be a strictly stationary process such that 
every set invariant under shifts has measure zero or one. Let {dn} be a sequence of 
real-valued functions, dn being a measurable function of n + 1 variables. Then if 
the sequence $n(%,°-+: , tn) and the sequence $n(%_n, +++ , Xo) both converge in 
probability, their limits are almost surely constant and equal. 


Received June 25, 1957; revised October 25, 1957. 
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” , P ro 4 1 . 
Proor. We assume that $n(t_n, +++ , 20) +» @ , On(I0,°°*, In) oO. Let, 


be any differentiable function such that | f| < 1, | f’ | S 1, and let 


a2 = S(¢,(z a> ** » Sob {7 = f( a(Zo leit” %u)). 
oie a , ae ae 
Since | fn — f(@ )| S |on(t-n, +++ ,2%o) —@ |, Wehavethat f, + f@ ). From 
the uniform boundedness of the f,, it follows that f| fz — f@)| — 0, and 


similarly f | f; — f(¢")| — 0. We denote by 7 the shift operation, so that 
T'"bn(In,°°°* 5 Zo) = Gnl(X0,°°° DLs 
By the ergodic theorem, 


n 


lim | Ee) — 1d re) =0. 


Tl k=} 


Hence, 
s = + : | = “ef me 
[ |\Ef(¢ ) — f(¢@") | S lim sup - x [ T’f(@) — Thi 
- nm n 1 ~ 
, l< f * “1 
+ lim sup - = fi —IS@) |. 
n mii J 
Since 7’ is measure preserving, the terms on the right may be reduced to 


lim sup - Zz | \f(@ ) — fe | + lim sup — / fz — S$) | =0 
n toi n 1 


n 
We conclude that f(¢") is almost surely constant, which proves the theorem. 


2. Applications. We use the above theorem first to prove a result concerning 
maximum likelihood ratios, which is a special case of a theorem due to C. Kraft 
‘on 
2]. 

THEOREM 2. Let the -++ x41, <0, %1,-°°* process be distributed according to 
the stationary ergodic measure P with density functions-p,(',---,°,) and let Q 
be any other stationary measure with density functions.4,(°,--- , °) such that P 
1s not ahsolutely continuous with re spe ct to Q. Then almost surely (a.s.) 


Qn ’ ** In 
lim # —_—— = (0 
n Pal Xe °* Za) 

Proor. Let @, = gal’. +++. °)/pal', +++, °); it is well known ([3], pp. 93, 
348) that the sequence —¢,(2,--- , 2.) forms a semi-martingale with respect 
to the fields B_; generated by , t+, Un-1. Similarly, the sequence 

@Oald-wa+s°***  & 


forms a semi-martingale with respect to the fields B,_, generated by xz 

. Since in both cases the first absolute moments are bounded by one, 
both sequences converge a.s. From our main theorem we conclude that there 
is some constant @ such that 


Qn(o, 
lim 25<% —— - =a (as.) 


v 








n DalXo,'°°° , In) 
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Now for any finite dimensional cylinder set J, we have for all n sufficiently 
large, 


QU) 2 Gn(Xo5 ae Zn) dP. 
“I 


Equality would obtain, except that p, may vanish on some set of positive 
measure where g, does not. Using the Fatou lemma, 


QU) = lim inf on(%0, °°°,2n) GP = lim inf ¢, (20, «++, 2%n) dP = aP(J). 
JJ vr 


Since the above is true for any cylinder set, it is also true for any finite disjoint 
union of cylinder sets, and thus is true in general, contradicting our hypothesis 
unless a = 0. 

Another application of Theorem 1 results in an extremal property of random 
variables having symmetric distributions. While we do not know of an explicit 
statement of this theorem, it can also be proven using de Finetti’s representa- 
tion theorem (see, for example [1] and [4], p. 364), without too much difficulty. 

THeorEM 3. Let x, 21, +--+ be a sequence of random variables whose finite 
dimensional distribution functions are invariant under permutations of the argu- 
ments and such that every “‘tail’’ event has measure zero or one. Then the sequence 
is equivalent to a sequence of independent random variables. 

Proor. From the symmetry of the x, 2, --+ sequence follows its station- 
arity. By the usual procedure we extend the measure to the double-ended se- 


quence «+--+ 21, %, %1,°*:, noticing that the symmetry is preserved under 
this extension. We also verify that the zero-one hypothesis implies the process 
is ergodic so that we may apply Theorem 1. We define ¢,(x%0 , 71, °-* , 2; 
p(t. S a|x,-+-+,2,). Then, by symmetry and stationarity 

On(Tn, °** » Xo) = P(M%i BS A|%,°°* , Tn). 


By the martingale convergence theorem, both 
Dal Sos ° > 5 Be)s Palins *** » Be) 
converge a.s. and we conclude that both 
Wt SO), tH, >** jy Ma S @) xX, %a,*>*) 


are a.s. constant, which proves the theorem. 
A more specialized consequence of Theorem 1 runs as follows. 


THEOREM 4. Let x), %,-°-++ be a sequence of identically distributed, inde- 
pendent random variables, and {¢,} a sequence of real-valued functions, $, @ meas- 
urable function of n + 1 variables. Then if both on(x0, +++ , Xn) and 

Gite 5.**> 5X) 


converge in probability, their limits are a.s. constant. 
Proor. By the usual procedure we extend the measure on x, %,°:: toa 


measure on the two-sided process «++ , 21, 0, %1, °°: . The set 


Pb) 
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[| Om(Zm, °°* » To) — Gn(t-n, °** , o)| >] 


has the same measure as the set [| @m(tm,-°*:, ) — On(Xn,°**, Zo)| >. 
Hence ¢,(r_», *** , Zo) converges in probability and Theorem 1 applies. 

The above theorem is an extension of the Hewitt-Savage zero-one law for 
symmetric sets, as the following theorem makes clear. 

THEorEM 5. Let x, 2, «++ be a sequence of identically distributed, independ- 
ent random variables and f any integrable function on the process such that f is 
invariant under finite permutations of the coordinates. Then f is a.s. constant. 

Proor. Let ¢,(x -, In) = E(f| ao, ---, Ia). Then dn(2n,°°* , X%) = 
On(Xo,*** , tn) by the symmetry of f and the ¢,(z0, --- , Zn) sequence forms a 
martingale which converges a.s. to f. The conclusion follows from Theorem 3. 


REFERENCES 
{1} E. Hewirr anp L. J. Savace, ‘Symmetric measures on Cartesian products,” Trans. 
Amer. Math. Soc., Vol. 80 (1955), pp. 470-501. 
(2) C. Krart, ‘Some conditions for consistency and uniform consistency of statistical pro- 
cedures,’’ Univ. of California Publ. Stat., Vol. 2, No. 6 (1955), pp. 125-141. 
(3) J. L. Doon, Stochastic Processes, John Wiley and Sons, New York, 1953. 
[4] M. Love, Probability Theory, D. Van Nostrand, New York, 1955. 
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A TEST OF FIT FOR MULTIVARIATE DISTRIBUTIONS! 
By Lionet WEIss 
Cornell University 


1. Summary and introduction. Suppose Y is a chance variable taking values 
in k-dimensional Euclidean space. That is, X¥ = (Y,,---, VY), where Y; isa 
univariate chance variable. The joint distribution of (¥,, --- , Y:) has density 
fly, +++, Ye), Say. 

We shall call a function h(y, , --- , y%) “piecewise continuous” if it is every- 
where bounded, and k-dimensional Euclidean space can be broken into a finite 
number of Borel-measurable subregions, such that in the interior of each sub- 
region h(y,--+-, ys) is continuous, and the set of all boundary points of all 
subregions has Lebesgue measure zero. 

We assume that f(y, +--+ , ye) is piecewise continuous. Let h(y:, +--+ , ye) be 
some given nonnegative piecewise continuous function, and let X,, --- , X, be 
independent chance variables, each with the density f(y, +--+, ys). Choose a 


nonnegative number /, and for each 7, construct a k-dimensional sphere with 





center at (Y Cg See S Y«) and of /:-dimensional volume 
th(Y ix, WHS, Y x) 
n 
Such a sphere will be called “of type s”’ if it contains exactly s of the (n — 1) 


Received August 8, 1957. 
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points X,,---, Xiu, Nis,--:, X,. Let R(t; s) denote the proportion of 
the n spheres which are of type s. 

For typographical simplicity, we denote the vector (7, --- by Let 
S(t; s) denote the multiple integral 


20 ao 


(t*/s!) ed | h'(y)f*""(y) exp {-th(y)f(y)} dy --- dys. 
oo 


— 2 2 


It is shown that F,(/; s) converges stochastically to S(f; s) as n increases. This 


result is then used to construct a test of the hypothesis that the unknown den- 
sity f(y) is equal to a given density q(y 


2. Proof of the convergence of F??,.(t; s). We define the chance variable Z, to 
be equal to one if the sphere centered at X; is of type s, and to be equal to zer 
otherwise. R,(t;s) = (1/n)\(Z, + --- + Z, 

Let V(v; y) denote the probability assigned by the density f(y) to the spher 
of volume v and center at y. In any closed region in which fly is continuous, 
Viv; Y) can be written as f YY) tr vely; , Where e Y3 1 approaches zero as i 


approaches zero, uniformly in y throughout the region. Clearly, F}Z;} is equ 


to 
[oo fle (H)] 
L.00 i si(n = ] Ors nm og | 


. th‘ J) — - 
1— J) (“ ; /) f(y) dy, dy. 
n 


The region of integration can be broken into a finite number of subregions, in 











(2.1) 


the interior of each of which f(y) and h(y) are continuous. A closed subset of 
each such subregion can be found so that the measure of the set of points in 


k-dimensional space outside these closed subsets is arbitrarily small. Within 


each such closed subset, we may write 


, [thly) th(y) .- . , thy) th(y) 
Pi 59) © fe + 


n n n n 


where e(y; th(y)/n) approaches zero as n increases, uniformly in y in the closed 
subset. Then it follows easily that the multiple integral (2.1) converges to S(f; s 
as n increases, so that F{R,(t; s)} approaches S(t; s) as n increases 

To complete the proof that F,(¢; s) converges stochastically to S(t; s), we 
shall show that Var {R,.(¢; s)} approaches zero as n increases. Var {F,,(t; s)} is 
equal to 


* > Var {Zi} + Cov {Zi, Z;}- 


NM” i=l nm ixj 


Since {Z;} are uniformly bounded variables, (1/n’)}> Var {Z;} approaches 
zerO as n Increases Theref re, to show that Var (kt: s)} approacl es zero, it 
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will suffice to show that £{Z,Z;} approaches [S(t; s)|° as n increases, since this 
implies that Cov {Z;, Z;} approaches zero. E{Z,Z;} is equal to 


( sa SY (y ° (> F 
f - [ ao [v (! uv) 1) | lv (* z). 2) | 
J pr, J sisi(n — 2 — 2s)! n n 
A@ ; (2 n—2—2s 
E _y (? y) ) ie (“ z) :) | 
nm n J 


SWF) dy +--+ dy, dz, --- dz 


~ 
to 


+ | eee / Q( fy z)f(y) f(z) dy — dy dz, dz, 
Re 


where R, is the region in (y, z) space such that the /-dimensional sphere of 
i som t J 

volume th(y)/n centered at y does not intersect the /-dimensional sphere of 

volume th(z)/n centered at z; Rz is the remainder of (y, z) space; and Q(y, z) 


is the conditional probability that the spheres around X; and X; are both of 
type s, given that X, = y and X, = z. Clearly, the second integral in (2.2) ap- 


proaches zero as n increases, and the first approaches [S(¢; s)]". This completes 
the proof that 2,(t; s) converges stochastically to S(t; s) as n increases 


’ 


3. Application to multivariate tests of fit. Suppose the density of Y, f(y, 
jx) Say, is unknown, and the hypothesis to be tested is that almost every- 
where over a given region R, f(y; ,--+, vw) = gly, -°+: , Ye), Where g(yi,---, 
y.) IS a given piecewise continuous function, g(yi,---.y%) 2 B > O at every 


i 2 / 8 [ ow, 7s *. Ma dy es dy, > @. 
- R 


point of R, and 
The hypothesis says nothing about f(y; , --- , y,) outside the region Rk 

To construct a test of this hypothesis, we apply the result of Section 2 with 
g = i s = iL. and h m5 *** 5 = | | Fe for (y; ° Yi in R. 
4.) = 0 elsewhere. Then 


raYi **° dy. 


Using the fact that the function ue” 
we find that 


“ takes on its absolute maximum at wu ] 


S(1;1) s - [ Cee fy) dy, ° dy, 


J R @ 


with equality holding if and only if g(y) = f(y) almost everywhere on R wher 
f(y) > 0. Denote by Q(n) the proportion of the observed points X,, Ve. 


Y,, that fall in the region R. Q(n) converges stochastically to 


i [is ) dy, re dis 


. 
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as n increases. Thus, if the hypothesis is true, R,(1; 1) converges stochastically 


to 
e- [ vee [ow dy; --- dyx, 
- R 


and Q(n) converges stochastically to 


[oe [oW dn dye. 
R @¢ 


Conversely, if R,(1; 1) converges stochastically to 


| vee [ow dy, --- dy 
R 


and Q(n) converges stochastically to 


[-~ [ow dy, --+ dyn, 
R 


then 


S(1;1) = eT] ote [sw dy, --- dye, 
R 


so the hypothesis is true. 
For a given n, we define the following test 7’, of the hypothesis. Accept the 
hypothesis if and only if 


| R, (1: 1)-—e& | ee [ow dy, --- dyx| < An 
R | 


| gly) dy, «-- dys| < Bes 


| 


()(n) -| 
R 
where A, , B, are numbers chosen to give the desired level of significance, and 
it may (and will) be assumed that A, and B, both approach zero as n increases. 
From the discussion above, it is clear that the sequence of tests {7} is con- 
sistent against any piecewise continuous alternative f(y). To set the exact 
values of A, , B, the joint distribution of Q(n) and R,(1; 1) would be required, 
but this distribution is unknown, although the author conjectures that it is 
asymptotically normal. However, given the function g(y), the region R, and an 
alternative f(y), the integrals (2.1) and (2.2) can be evaluated, at least approxi- 
mately, and then Chebyshev inequalities can be used to give an upper bound 
to the level of significance and a lower bound to the power, for a given choice 
of A, and B,,. 
There are other consistent tests for the hypothesis under discussion: the 
chi-square test and obvious extensions of the univariate Kolmogorov-Smirnov 
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and von Mises tests. A comparison of the power functions of these tests would 
be of great interest. Almost nothing is known of the small-sample power of any 
of these tests. The large-sample power of the chi-square test is known. It is 
the author’s conjecture that the limiting joint distribution of Q(n) and R,(1; 1) 
is bivariate normal under the alternatives as well as under the hypothesis. If 
this conjecture could be proved, the asymptotic power of the proposed test 
would be known. 
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TABLES FOR OBTAINING NON-PARAMETRIC TOLERANCE 
LIMITS 


By Paut N. SOMERVILLE! 
General Analysis Corporation, Sierra Vista, Arizona 


The general consideration of non-parametric tolerance limits had its origin 
with Wilks [10]. Wilks showed that for continuous populations, the distribution 
of P, the proportion of the population between two order statistics from a ran- 
dom sample, was independent of the population sampled, and was in fact a fune- 
tion only of the particular order statistics chosen. Wald [9] and Tukey [8] ex- 
tended the method to multivariate populations, Tukey being responsible for the 
term ‘‘statistically equivalent block.”’ Their work was extended further by Fra- 
ser [2], [3]. Murphy [4] presented graphs of minimum probable coverage by 
sample blocks determined by order statistics of a sample from a population with 
a continuous but unknown c.d.f. This note extends the results of Murphy, and 
tabularizes the results in a manner which eliminates or minimizes interpolation, 
particularly with respect to m, in a large number of cases. The form of Table I 
parallels the tables of Eisenhart, Hastay and Wallis [1] ‘‘Tolerance Factors for 
Normal Distributions.” 

Let P represent the proportion of the population between the r® smallest 
and the s* largest value in a random sample of n from a population having a 
continuous but unknown distribution function. Table I gives the largest value 
of m = r + s such that we have confidence of at least that 100 P percent of 
the population lies between the r™ smallest and s“ largest in the sample. Note, 
that we may choose any r, s 2 0 such that r + s = m. We must, of course, 
decide upon the values of r and s independently of the observations in 
the sample. We obtain one-sided confidence intervals when we use r = 0 or 
s = 0 for a given m. The values of m are the largest such that 


y = I,_p(m,n — m+ 1) 
where J is the incomplete Beta function tabulated in [5] and [7]. 
teceived March 21, 1957; revised January 15, 1958. 
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TABLE I 
Values of m = r + s such that we may assert with confidence at least y that 100 P 
percent of a population lies between the rth smallest and the sth largest of a 
random sample of n from that population (continuous distribution function 


assumed ) 
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TABLE I] 
Confidence y with which we may assert that 100 P percent of the population Lies 
between the largest and smallest of a random sample of n from that population 


(continuous distribution assumed 
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Table II gives the confidence y that 100 P percent of the population lies 
between the largest and smallest of a random sample of n. 

In the case where we are dealing with a multivariate population, we take m 
to be the number of blocks (See Tukey [8]) excluded from the tolerance region. 
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NONPARAMETRIC ESTIMATION OF SAMPLE PERCENTAGE 
POINT STANDARD DEVIATION 


By Joun FE. Wasa 


Military Operations Research Division, Lockheed Aircraft Corporation, Burbank, 
California 
1. Summary. The available data consists of a random sample x(1) < --- < 
r(n) from a reasonably well-behaved continuous statistical population. The 
problem is to estimate the standard deviation of a specified x(r) that is not in 
the tails of the sample. The estimates examined are of the form 


alx(r + 1) — x(r — 1)] 


and the explicit problem consists of determining suitable values for a and 7. 
The solution 


a = (3)(n + 1) [r/(n + Df — r/(n + 1)", 


Received November 15, 1957. 
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appears to be satisfactory. Then the expected value of the estimate equals the 
standard deviation of x(r) plus O(n”) 

. . —9/10, 
estimate is O(n ) 


; also the standard deviation of this 
. That is, the fixed and random errors for this point esti- 
mate are of the same order of magnitude with respect to n. Solutions can be 
obtained which decrease the order of one of these types of error. However, 
these solutions increase the order of the other type of error, so that the over- 
all error magnitude exceeds O(n *"”). 

2. Introduction and results. A sample percentage point x(r) furnishes a point 
estimate of the corresponding population percentage point @{[r/(n + 1)], where 
6(p) = O[p| represents the 100p percent point of the population sampled. The 
appropriateness of x(r) as an estimate depends on its variability. Thus an esti- 
mate of the standard deviation of x(r) can be of value. This paper presents an 
easily computed nonparametric estimate of the standard deviation of x(r) that 
is valid for most continuous populations of practical interest and has favorable 
properties compared to other estimates of the same type. 

The estimate derived is based on the results of [1]. The expected value and 
variance-covariance expansions of [1] are assumed to be valid for the continu- 
ous statistical population sampled. In particular, this population is assumed 
to have a probability density function that is analytic and nonzero at all points 
of interest. These requirements appear to be satisfied for most practical situa- 
tions that involve continuous populations. 

The derived estimate properties are approximate in the sense that terms of 
specified orders of n are neglected. The order results stated are not applicable 


for the case of extreme observations. That is, p, = r/(n + 1) and q, = 1 — p, 
are assumed to be bounded away from 0 and 1. Also a standard deviation esti- 
mate is not necessarily reasonably accurate even for situations where the order 


relations are valid. In some cases the neglected terms may be important even 
though they are of the stated order with respect to n. For many commonly en- 
countered types of populations (unimodal, ete.), the importance of the neglected 
terms tends to increase as p, deviates from }. Examination of the expansions 
used in the derivations suggests that the standard deviation estimates presented 
are usually satisfactory if 
Pram + i > ¢ 

This relation implies that the magnitude of the increments with respect to 
which the final expansions are made never exceeds (3)p,4, . 

Let o{w} denote the population standard deviation of w. The statistic ad- 


vocated for estimating o{2(r)} is 


s[z(r)] = 3(n + 1) pagetatr + (n + 19") = afr — (n + 1)"9}, 
where «[z] = x (largest integer contained in z). This statistic has the properties 
E{s{x(r)]} _ afax(r)} Ars O(n 10) 


= gfa(r)}{1 + O(n), 


o{s{x(r)]} = O(n”), 
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The statistic s{x(r)] has the smallest order error of all expected value estimates 
of o{x(r)} that are of the form a[z(r + 71) — 2(r — 1)], where 7 is o(n). Here 
the order of the error of an estimate is considered to be the larger of the order 
of E(estimate) — o{x(r)} and the order of o{estimate}. 

The notation f[x] is used to represent the probability density function of the 
population sampled. For the situation considered, 


o}x(r)} - V p4r/Vn o 1 f(0(p,)) + O(n wi 


This relation shows that a modification of s[x(r)] can be used in estimating the 
value of the density function at the point x = 6(p,). Explicitly, 


Vn + Is[x(r)|/-V page 


furnishes an expected value estimate of 1/f[@(p,)] that is accurate to terms of 
order n~*° 


3. Derivations. This section contains a verification of the properties stated 
for s{x(r)] in the preceding sections. 
Consider any integer ¢ such that 1 S ¢ S n. From the results of [1], 
Pr qe f'[6(p,)] 


Bist) = Ap) — op Pee ee, + OW), 
{x(t)] De din + Dslopor > n-) 


o{x(t)} = Vi 4 J/n + 1f(@(p.)) + O(n”). 


Here p, = t/(n + 1) and q, = 1 — p,. These expansions, combined with ap- 
propriate use of Taylor series expansions, form the basis for the derivations. 

Let i = e(n + 1)* = integer, where 0 S a@ < 1 and both ¢ and a are 0(1). 
Using (1) and expanding around r in Taylor series, 


2 / 
Elx(r — t)] = 6(p,) : | € ‘+ Pr Qr |} [0(p,)] 


(n + 1)*f[6(p,)) Li + 1) * 2m + 2) J flop) F 


O(n ***) + O(n***) 


‘ -| Sex + ote eee 


(n + 1)'-*f[0(p,)] (n + 1)? * 2(n + 2) | flO(p,)/s 


O(n***) + O(n™** a 


2ae 


ae ff  — ( — ¢)l} = me 
E{a{x(r + 2) — z(r — 1))} in + )==FeO@)] 


+ O(an™****) + O(an-**) 


ofalx(r oe t) = alr cas )]} = er oF Mn + ee ? £[0(p,)] + O(an~* Bey 


The problem is to use these relations to determine suitable values for e, a, 
and a. 
Since a[x(r + 7) — 2x(r — 7)] is an expected value estimate of o{z(r)}, 


1/9 


2ae/(n + 1)" = pegr / (n +1)", or a= (1/26)Vp.qe(n + 1)'**. 
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Using this expression for a, 
E{a[x(r + 7) — 2(r — 1)}} a{x(r)} + O(n ?***) + O(n”) 
ofalx(r + 1) — 2(r — i)]} = O(n 2”). 
Thus increasing @ decreases the order of magnitude of 
afalx(r + 2) — 2(r — 2)j}, 
but increases the order of 
E{alx(r + 1) — z(r — 2)]} — of{x(r)}. 
Hence the order of the error is minimized when 
—1/2 — a/2 = —5/2 + 2a. 


Thus a = 4/5 appears to be the most desirable choice for a. 

In cialx(r + 2) — x(r — 7)]}, the parameter ¢« appears predominantly as the 
factor 1/ We. In Ef{a[x(r + i) — x(r — a} — ofx(r)} the predominant factor 
is €. Solution of the equation 


€ = 1/Ve 


suggests that « = | is an appropriate compromise choice for e. 


Use of a = 4/5, « = 1, and the expression for a yields the results 


i = (n+ 1)”, a= }3(n+ 1) "VW par, 


> 


and verifies the properties stated for s[zx(r)]. 
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A UNIQUENESS PROPERTY NOT ENJOYED BY THE NORMAL 
DISTRIBUTION 


By GerorGE P. STECK 
Sandia Corporation 


1. Summary. It is well known that if X and Y (or1/X and 1/Y) are independ- 
ently normally distributed with mean zero and variance o°, then X/Y has a 
Cauchy distribution. It is the purpose of this note to show that the converse 
statement is not true. That is, the fact that the ratio of two independent, identi- 
eally distributed, random variables XY and Y follows a Cauchy distribution is 
not sufficient to imply that X and Y (or 1/X and 1/Y) are normally distributed. 
This will be shown by exhibiting several counterexamples. 


teceived October 21, 1957; revised December 26, 1957. 
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2. Construction of counterexamples. Let X and Y be independent, identically 
distributed, random variables with common symmetric density function p 
Let ¢ denote the characteristic function of (4/r) log |X|, let Z = X/Y, and 
let w denote the characteristic function of (4/2) log | Z|. The fact that Z has a 
Cauchy distribution implies that 


l 


Velwloaiz) (2) ——., 
yee 4 cosh ru/4 


—-xn cu<c+o, 


and, hence 


[ — du l 
«© cosh wru/4 cosh 2t° 


Sines 
= *log| X| — <log| ¥ |, 
T T 


it follows that 


l 
d)-A-) = —, 
elt) -e(—1) cosh 2¢t 
and, therefore, 


6(t 


é _ cos A(t) sin 6(t) 


a ee ee rai, 
: ov) (cosh 2t)! (cosh 2t)! m (cosh 2t)! 





where @ is continuous, real, odd, and of such a form that ¢ is a characteristic 
function 


‘ 


Since g(t) must be inverted by contour integration to find corresponding dens- 
ity functions, equation (1) suggests that @ be chosen so as.to eliminate the square 
root. The relations \ 


cosh 2 cosh* ¢ + sinh’ ¢ 1 + 2sinh*/ 
provide two tunctions 6 which accomplish this, namely, 
A(t arctan tanh / 
and 
6.(t) = arctan +/2 sinh ¢. 


Other functions @ which immediately suggest themselves, even though they do 
not eliminate the square root, are 


63(t) = Q, and 64(t) 
The corresponding functions ¢ are 


cosh t . sinh t 


——— 1 ——.,, 
cosh 2t cosh 2¢t 


gilt) = 
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] ‘ - sink 
+iv/3 sinh t 


cosh 2¢t cosh 2t’ 


1 a 
SS a . and p (t BS == . 
(cosh 2t)! ea\t) (cosh 2t)! 
If these functions ¢ are inverted (see [1], pp. 388-389, and [2], p. 30) and a 
change of variable made from (4/2) log | X | to X, assuming X symmetric, then 
the corresponding density functions are 


pix) = 
p2(x) 


pa(z) aime ites isl) / 


T 


p(x) = : 


Using 6(—?) instead of @(t) provides additional densities p*(x 
(if p is the density function of X then p* is the density function of 1 
For example, 

*(z) = v2 l 
aie wr 1+ 2’ 
2 | 
r(l+2)(1 + 2)’ 


p2 (x) 
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ESTIMATION OF A REGRESSION LINE WITH BOTH VARIABLES 
SUBJECT TO ERROR UNDER AN UNUSUAL IDENTIFICATION 
CONDITION 


By Herman RUBIN 
University of Oregon 


Suppose the random variables w; = (&;, uj, v;) are independently and iden- 
tically distributed with joint distribution F. Then if fff e%**" dF(E, u, v) exists 
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for all a, 8 in a neighborhood of 0 and fff e dF(é, u, v) does not exist for all ¢ 
in any neighborhood of 0, Jeeves [1] has shown that the parameter @ in 


xz; = §&;cosé+ u 


(1) 
y; = §;sin@+ 0;, 
is identified. We shall construct a consistent estimate of 6 (mod 7) if these con- 
ditions are satisfied. 
First, let us consider a univariate distribution G with moment generating func- 


tion g. Then g(t) = )> (u, /n!) t", ua the nth moment, and if the radius of con- 
vergence is r, it is well known that 


om Ln l/n 
= lim (es!) 
n! 


As easy application of Liapounoff’s a — Stirling’s formula shows that 


(2) p= 


si 


ne 


(3) p == lim - on) 


on 


Therefore a natural procedure would seem to be to consider the sample mo- 
ments m2,(@) of x; sing — y; cos ¢ and to define 6 as that value of ¢ minimizing 
max, (Men(@)) “"/n. For fixed sample size, this maximum exists. We shall show 


that this estimate is indeed consistent, and even converges with probability one 
to 6. 


First let us show that max, m2,(6)'°"/n is bounded as a function of the sample 
size N with probability one. Let 
; , bs 
v(t) = E(cosh[t(u; sin @ — v; cos 6)}) = (On)! ; 


where ye, is the 2nth moment of u; sin 6 — v; cos 6. Then [2], for jt} S s <r 

Vrit >> m2,f°"/(2n)! converges to y(t) uniformly with probability one’ 
. . eye ° 1/2 rs \71/2 

Thus y»(t) i is bounded with probability one, and since mz, “"/n S (K/t) f¥x(O]~”» 


max, M2,(@)°""/n is bounded with probability one. Hence with probability 





greater than 1 — ¢€/3, H, can be used for the bound. Similarly, 
1 1/2n 
(j > (u; sin @ — v; cos oe) 
max —~ <= £. 
no n 


for all N with probability greater than 1 — ¢/3. Let 6 be given, 0 < 6 < z, 

+" 1/2 . e 
and let y, be the nth moment of £;. Since (y2,)~"/n can be made arbitrarily 
large by selecting n large enough, select n so that 


(y2n)" o > + K, 





n sin 6 


Then with probability greater than 1 — ¢/3, 


( IN tad et 2n > He + K, 


sin 6 
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for all N sufficiently large. By Minkowski’s inequality 


1/2n 1 
n \ ° ' 2n 1 ‘ 2n 
Maon(o)" = | sin (@ — @) | ( 7 §; ) — ( b> (u; sin @ — »; cos Oo) ) 
Therefore with probability greater than 1 — e, 


n nr 


max, (meon(@))’ - > max, (m2,(8))* Qn 


for all N sufficiently large for all @ not in the interval (6 — 6, 6 + 6) (mod 7). 


REFERENCES 
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ON THE DECOMPOSITION OF CERTAIN x? VARIABLES 


By Rospert V. HocG anp ALLEN T. CRAIG 
University of Iowa 


It is well known that if the sum, say Q = Q; + Q., of two stochastically 
independent variables is x° with r d.f., and if Q; is also x’ with r, d.f., then Q. 
is likewise x° with r, = r — r; df. If the hypothesis of stochastic independence 
is removed, little can be said about Q, . It seems to us quite interesting that if 
the variables under consideration are real symmetric quadratic forms in either 
central or non-central, stochastically independent or dependent normal variables, 
and if the hypothesis of stochastic independence of Q; and Q, is replaced by the 
weaker hypothesis Q. = 0, then Q; and Q, are stochastically independent so that 
Qz is itself a x* variable with r, = r — nm df. 

Before we state our theorem, we recall [1] that the real symmetric quadratic 
form Y’BY in n mutually stochastically independent normal variables Y’ = 
(yi, Ye, °**, Yn) With unit variances and means U’ = (mw, Ue, --: . Un) has a 
non-central x* distribution whose characteristic function is 


an 
ff) = exp | ———— J — 2it)"” 
o(t) exp |; — ait | ‘1 — 2it) 


if and only if B’ = B. Here, 9 = U’BU and r is the rank of B. 

THroremM. Let Q= Q.+ --- + Q:1+Q., where Q = X'AX and Q; = 
AA XZ, 7 = 1, 2, I, are real symmetric quadratic forms in n normally dis- 
tributed variables X’ = (x, , %2,+++,2n) with means M’ = (m,, m,--- , My 
and real symmetric definite positive variance-covariance matrix V. Let Q, Q:,--: , 
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Qi_1 have non-central x° distributions with parameters r, 6 and r; , 6;,(j = 1, °°°, 
k — 1), respectively and let Q, be non-negative. Then Q; , Qo, --- , Qe are mutually 
stochastically independent and Q, has a non-central x° distribution with parameters 


n=r— Dr'r;,% =O0— di s;. 


Proor. We first prove the theorem for = 2. There exists a real symmetric 
positive definite matrix C such that C’C = V. If we let X = CY, Y’ = (wm, 
Yo,°** , Yn), and at the same time let M ce.e (u; , U2, °°*, Un), then 
Yr, Y2,°**, Yn are mutually stochastically independent normal variables with 
unit variances and means U’ = (uw, uw.-+:, U,). Also 


X’AN = X’A\X + X’AX 


becomes Y’BY Y’B,Y + Y’B:Y, where B = C’AC, B, = C’A,C, B, = 
C’A.C, and B = B; + B.. By hypothesis, Y’BY and Y’B,Y have non-central 
x distributions and Y’B.Y = 0. Thus B® = B and Bj = B,. With a suitably 
chosen orthogonal matrix L, L’BL is a diagonal matrix having r ones and n — r 
zeros on the principal diagonal. Since B, and Bz are semi-definite positive, each 
element on the principal diagonal of L’B,L and L’B.L is non-negative and hence 
each of these matrices has a zero on the principal diagonal corresponding to 
each zero on that of L’BL. Moreover all elements in the rows and columns of 
L’B,L and L’B,L in which these zeros appear are likewise zero. If we properly 
choose our notation we may the write L’BL = L’B,L + L’B.L, using sub- 


matrices, as 
(" °) e \) E 0 
0 o/ \o o 0 >) . 


If we multiply on the left by L’B,L and make use of B] = B,, we have 
L'B,B.L = 0. 


That is, B,B, = 0, so, by a result of Carpenter [1], Y’B,Y and Y’B.Y (that is, 
Q; and Q:) are stochastically independent. Since Q and'Q,; have non-central x’ 
distributions it follows that Q. has a non-central x* distribution with parameters 
ro =r—n, & = 6— 6. For k > 2, the proof of the theorem is easily com- 
pleted by induction. 

As an example, let (2; , ¥:), --- , (%,,, Yn) denote a random sample from a bi- 
variate normal distribution having unit variances, means m, and m, , and corre- 
lation coefficient p. It is fairly obvious that the left member and the first term 
of the right member of 


n 
> (x3 — 2pri yi + ¥i)/( — p) = (nz? — 2pnzy + ni?) /(1 —p) 
1 


n 


= -. [(x; — #* — Q(z; — (yi — 9) + (ys — D/A — 
1 
have non-central x’ distributions with parameters r = 2n, 


© 


fon? ; 2) 2 
6 = n(mz — 2pm,zm, + m,) / (1 — p 
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and r,; = 2, 6; = n(m; — 2pm, m, + m3)/(1 — p’) respectively. Accordingly, 
the non-negative form 


DSI ((xi — 2) — 20(2i — Dus — ) + UV - DV —- P’) 


has a central x’ distribution with 2n — 2 degrees of freedom. 
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A NOTE ON THE GENERATION OF RANDOM NORMAL 
DEVIATES' 


By G. E. P. Box anp Mervin E. MULLER 
Princeton University 

1. Introduction. Sampling experiments often require the generation of large 
numbers of random normal deviates. When an electronic computer is used it is 
desirable to arrange for the generation of such normal deviates within the ma- 
chine itself rather than to rely on tables. Pseudo random numbers can be gener- 
ated by a variety of methods within the machine and the purpose of this note is 
to give what is believed to be a new method for generating normal deviates from 
independent random numbers. This approach can be used on small as well as 
large scale computers. A detailed comparison of the utility of this approach with 
other known methods (such as: (1) the inverse Gaussian function of the uniform 
deviates, (2) Teichroew’s approach, (3) a rational approximation such as that 
developed by Hastings, (4) the sum of a fixed number of uniform deviates and 
(5) rejection-type approach), has been made elsewhere [1] by one of the authors 
(M.M.). It is shown that the present approach not only gives higher accuracy 


than previous methods but also compares in speed very favourably with other 
methods. 


2. Method. The following approach may be used to generate a pair of random 
deviates from the same normal distribution starting from a pair of random num- 
bers. 

Method: Let U, , Us be independent random variables from the same rectan- 
gular density function on the interval (0, 1). Consider the random variables: 
' X; = (—2 log. U;)"” cos 24U2 

1) 
X2 = (—2 log, U;)'” sin 24U2 
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Then (X,, X2) will be a pair of independent random variables from the same 
normal distribution with mean zero, and unit variance. 

Justification: From (1) (giving attention to principal values), one obtains 
at once the inverse relationships: 


72 72 
U; =e st X2) 
X» 


] d 
—— arctan — 
2r X, 


U2 


It follows that the joint density of X,, Xz is 


uy. vx) . 1, —(Xi+ X:) 1 —-Xi 1 .-X: — 
f(X,, Xe) 2 eee = —— é a + mee = (Xd f(X2); 
2x 2 Vie 2 V2 2 
thus the desired conclusions, including the independence of X,; and X; is obtained. 
The above approach is motivated by the following considerations: the prob- 
ability density of f(X, , X2) is constant on circles, so 6 = arctan X./X;, is uni- 





formly distributed (0, 27). Further, the square of the length of the radius vector 


o 
2 


r’ = Xi + Xj has a Chi-squared distribution with two degrees of freedom. If 
U has a rectangular density on (0, 1) then —2 log, U has a Chi-squared dis- 
tribution with two degrees of freedom. Proceeding in the reverse order we arrive 
at (1). 


3. Generalizations and other random variables. Observations from the 
Chi-squared distribution with 2k degrees of freedom can of course be generated 
by adding together the k terms, )-{., (—2 log. U;) and for Chi-squared with 
2k + 1 degrees of freedom one may add the square of a normal deviate gener- 
ated by the above method. Deviates from the F-distribution and for the ¢-dis- 
tribution are obtained by calculating the appropriate ratio of deviates generated 
as above. From independent random normal deviates well known methods can 
of course be used to generate n-dimensional normal deviates with arbitrary 
means and variance-covariance matrix. 


4. Convenience and accuracy. The method suggested here grew out of the 
desire to have a way of generating normal deviates which would be reliable in 
the tails of the distribution. Since most computing centers have library programs 
to compute values of trigonometric functions, logarithms, and square roots this 
approach requires little additional machine program writing. The accuracy ob- 
tained depends essentially on the precision of the available library programs, 
whereas that of other methods cannot readily be increased. 
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ON A THEOREM IN METRIC SPACES 


By V.S. VARADARAJAN AND R. RanGaa Rao 


Indian Statistical Institute 


0. Introduction. In his paper “On a class of probability spaces’’ ({1]), D 
Blackwell observed that the class of Borel sets of a metric space may be a sepa- 
rable o-field without the metric space being separable. However, in a subsequent 
letter to one of the authors, he stated that the question remained open. The 
object of the present note is to prove that the separability of the o-field of Borel 
sets implies separability of the metric space, assuming the continuum hypothesis. 
What is actually used, is not the continuum hypothesis but the following propo- 
sition, which we will abbreviate as @: If u is an uncountable cardinal, 2° > c 
(the cardinal of the continuum). This is easily deduced from the continuum hy- 
pothesis and it seems to us that it has not so far been proved without the con- 
tinuum hypothesis (cf. [4]). The main conclusion is as follows: A metric space is 
separable if and only if the cardinality of the Borel sets is S c, provided we 
assume &. It is also shown that the above theorem implies @. 


1. The main result. We introduce certain notations. X is a metric space and 
® is the o-field generated by open subsets of X. Sets of @ are called Borel sets of 
X. @ is called separable if there is a sequence {A,} of sets of ® generating it. 
In that case, cardinality of @ is S c ((2]). Before proving the main result we prove 
an auxiliary result, interesting in itself. 


THEOREM 1. X is separable if and only if every disjoint family of nonempty 


open subsets of X is countable. 

Proof. lf X is separable, its topology has a countable basis G,, Gz, --- . 
Since any nonempty open set of X contains a nonempty G;, , the existence of an 
uncountable disjoint family of nonempty open subsets of X implies the existence 
of an uncountable disjoint family of nonempty G;,’s, which is impossible. To 
prove the converse, let us suppose that every disjoint family of nonempty open 
subsets of X is countable. Let n be an integer = 1 and let &, be defined as fol- 
lows: Kk, = {A: A C X32, ye A> d(z, y) > 1/n}. Elements of X, are sub- 
sets of X and are partially ordered by the relation of set inclusion. Further, 
every linearly ordered sub-family of ®, has a supremum in the family (namely, 
the set-union) and hence, by Zorn’s lemma, there are maximal elements contain- 
ing any element of K, , in particular any point of X. Let A, be one such non- 
empty maximal element. Maximality of A, implies that ify «eX — A,,d(y,z) S 
1/n for some x e A,. Further, each A, must be countable, as otherwise, the 
spheres with centres at the points of A, and radii (1/2n) will be an uncountable 
disjoint family of nonempty open subsets of X. 

Let now n run over 1, 2, --- and set A = U,A,. A is countable and for any 
y e X — A and any positive integer n, there is an x, ¢ A, such that d(y, z,) S 
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1/n. This shows that A is dense in X. Since A is countable, this completes the 
proof that X is separable. 

Remark. This result need not be true if X is not metric. See, for example, [3] 

We now prove our main result. 

TuHroreM 2. (Under assumption @) X is separable if and only if cardinality of 
® < c. In particular, X is separable if and only if ® is separable. 

Proof. If X is separable, its topology has a countable base G,; , Gz, --- which 
generates ® and hence cardinality of ® < c. Conversely let cardinality of @ be 
Sc. If {Aa}acr is any disjoint family of nonempty open subsets of X, then, 
every subunion of the A,’s is open and hence ¢ ®. There are 2" such subunions 
where u is the cardinal of J and since cardinality of ® is S c, we have 2° S c 
This however implies (in virtue of assumption ®) that uw S SN). Theorem | 
now applies and proves that X is separable. This completes the proof. 

Remark. We can show that Theorem 2 implies @. To see this, let X be an 
uncountable set with cardinal u. Give X the discrete topology so that @ is the 
class of all subsets of X. Cardinality of @ is thus 2” and since X is not separable, 
Theorem 2 implies that 2" > c. This is precisely assumption @. 
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(Abstracts of papers presented at the Ames, Iowa Meeting of the Institute, 
April 8-5, 1958.) 


11. Bias and Confidence in Not-quite Large Samples. (Preliminary Report) 
Joun W. Tukey, Princeton University, (By Title). 


The linear combination of estimates based on all the data with estimates based on parts 
thereof seems to have been first treated in print as a means of reducing bias by Jones (J. 
Amer. Stat. Assn., Vol. 51 (1956), pp. 54-83). Let y,.) be the estimate based on all the data, 
yi) that based on all but the 7th piece, 9,;) the average of the y;;) . Quenouille (Biometrika, 
Vol. 43 (1956), pp. 353-560) has pointed out some of the advantages of ny,.) — (n — 1)9:) 
as such an estimate of much reduced bias. Actually, the individual expressions ny;.) — 
(n — 1)yci) may, to a good approximation, be treated as though they were n independent 
estimates. Not only is each nearly unbiased, but their average sum of squares of deviations 
is nearly n(m — 1) times the variance of their mean, etc. In a wide class of situations they 
behave rather like projections from a non-linear situation on to a tangent linear situation. 
They may thus be used in connection with standard confidence procedures to set closely 
approximate confidence limits on the estimand. (Received December 26, 1957.) 


12. Limiting Distributions of k-sample Test Criteria of Kolmogorov-Smirnov-v. 
Mises Type. J. Krerer, Cornell University, (By Title). 


Let S; be the sample d.f. of n; independent, indentically distributed random variables 
with common unknown continuous d.f. F; (1 S j S$ k), the S; being independent. For test- 


ing the hypothesis H:F; = --- = Fy , several criteria were suggested by the author in Ann. 
Math. Stat., Vol. 26 (1955), p. 775. Among these are 7 = sup, 2 jn;[{S;(z) — S(z)]* and 


W=f LinlS;(z) = S(x)|*dS*(z), 


where S = Djn;S;/Dn; and S* = Dja;S; with D,a; = 1. It is proved by the method 


indicated in the above reference that, under H, the limit of P{7 < a?} as alln,; — ~, is 


2(5—2k) /2qi-k = a Fexp[— on 2a?) 
C((k — 1)/2) nat [Je-rye(an)] 

where a > 0 and a, is the nth positive zero of the Bessel function J 4_s)2; alternative ex- 
pressions are also given. When k = 4 or k = 2 the summand above reduces to an elementary 
function; the latter case gives the Kolmogorov-Smirnov distribution, since J!? is the 
Smirnov statistic when k = 2. The limiting d.f. of W is expressible in a series involving 
Hermite polynomials when k is odd and Bessel functions when k is even. For k = 2, W is 
the test suggested by Lehmann and Rosenblatt, and the above d-f. is the limiting w? d_f. 
in the form given by Anderson and Darling. (Received January 6, 1958.) 


13. A Rule for Action Based on Percentage Changes in the Sample Mean. D. 
B. OWEN, Sandia Corporation, (By Title). 


A random selection is made of n items from a normal population X, each item is measured 
once, and the sample mean Z is computed. The sample items are identified by some means 
and the sample and the remaining population are mixed at random. They are then sub- 
jected to some condition, such as storage, after which the same items that were first sam- 
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pled are measured again, giving a new mean 7. Some action is taken only if the new mean 
gy differs from the old mean # by more than p#, where p > 0. The probability of taking action 
using the above rule is shown to be expressible in terms of the bivariate normal cumulative 
probability function. If there is no change in the mean (from X to Y), the probability for 
action increases monotonically with an increase in the standard deviation. If there is no 
increase in the standard deviation (from X to Y), the probability for action increases mono- 
tonically with any increase in the mean. However, for a fixed (relatively large) increase in 
the mean, the probability for action drops with increasing standard deviation and then 
increases. (Received February 5, 1958 


14. On a Multivariate Gamma Distribution. P. R. Krisunaran and M. M. Rao, 
University of Minnesota. 


From the relation between the univariate Gamma and Gaussian distributions one natu 
rally considers the corresponding p-variate cases. Some writers in the past implied this in 
their approach to this problem. But the properties of the latter distribution are not well 
utilized. The derivation of a p-variate Gamma distribution by Krishnamoorthy and Par- 
thasarathy (these Annals, 1951) and later used by Gurland (these Annals, 1955) was not 
too direct. In this paper a simple derivation using the too familiar properties of the Normal 
and Wishart distributions is given. Also some interesting connections between the Gamma 
and Gaussian distributions are discussed. A special case for p = 2 (for correlatioas) has 
been given in Cramér’s book (p. 317). But that property is shown to be true for p > 2. 
Next the ‘‘Arithmetical Character” of this distribution in the sense of P. Lévy (Proc. Cam- 
bridge Philos. Soc. (1948), p. 295) is considered. Some small and large sample properties are 
also discussed. (Received February 6, 1958.) 


15. An Expression for the Cumulative Distribution Function of the Noncentral 
t-Distribution. D. B. Owen, Sandia Corporation, (By Title). 


The cumulative distribution function of the noncentral ¢-distribution may be expressed 
in terms of the univariate normal integral and elementary functions for an even number of 
degrees of freedom and for an odd number of degrees of freedom in terms of the univariate 
normal integral, elementary functions, and the 7(h, a) function. The T(h, a) function was 
tabulated by the present author in Ann. Math. Stat., Vol. 27 (1956), pp. 1075-1090. The 
above results were obtained by repeated integration by parts. For example, with one degree 
of freedom Pr(T S t) = G(—8/ V1 4+ @) + 2T(8/V/1 + 2B, t), where G(z) is the univariate 
normal integral from minus infinity to z, and é is the noncentrality parameter. This ex- 
pression is especially useful since the noncentral ¢-distribution has not been tabulated for 
one degree of freedom. (Received February 6, 1958 


16. The Fourth Product Moment of a Binary Random Process. J. A. McFappEN, 
Purdue University, (introduced by Judah Rosenblatt) (By Title). 


Let z(t) describe a stationary random process, and let y(t) = 1 when z(t) 2 O and y(t) = 
—1 when z(t) < 0. Let s(r; , r2 , 73) denote the fourth product moment, 


Ely(t)y(t + ridy(t + re)y(t + 1:)), 


where 0 S 7; S r2 S 1; . If x(t) is a Gaussian process, then s is related to the quadrivariate 
normal integral, which apparently cannot be expressed in closed form. For practical appli- 
cations it seems advisable to make different assumptions about z(t) (or about y(t). Let 
E{y(t)] = 0 and let all product moments of odd degree in y(t) be zero. Consider further- 
more the zeros of the function x(t). If the zeros obey the Poisson distribution, then a par- 
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uucularly simple result follows for s and for all higher moments. Another assumption is the 
following: Let unspecified events occur at times t; , fg, ---, according to the Poisson dis- 
tribution, the average number of events per unit time being denoted by ec. If alternate 
events (those at ¢; , ts , ---) are designated as zeros of z(t), then the autocorrelation func 
tion of y(t) is E[y(t)u(t + r)] = e-*" cos ar, and the desired fourth moment is 


s = et) cos u cos w — e~(“*2"*™) sin u sin w 


where u = at; ,.v = a(t2 1), and w = a(r3 — r2). (Received February 10, 1958 


\7. Approximate Solutions for the Probability Density of Zero-Crossing Intervals 
in a Gaussian Process. J. A. McFappren, Purdue University, (introduced 
by Judah Rosenblatt). 


Let x(t) be a stationary Gaussian process, and let Po(r) be the probability density of 
the lengths of intervals between successive zeros in this process. Under the assumption that 
the length of a given zero-crossing interval is independent of the sum of the previous 
(2m + 2) interval lengths, where m takes on all values, m = 0, 1, 2, ---, then the following 
integral equation can be derived for Po(r): Po(r) = Qi(r) — S5Q2(l)Po(r — 1) dl, where 
Q:(r) dr is the conditional probability of a zero with negative slope in the interval (¢ + r, 
t+ 7+ dr), given a zero with positive slope at time ¢, and Q2(r) dr is the conditional prob- 
ability of a zero with positive slope in (¢ + +, ¢ + + + dr), given a zero with positive slope 
at t. Using expressions for Q:(r) and Q.(r) given by S. O. Rice, the integral equation has 
been solved numerically for several choices of spectral density. The results compare favor- 
ably with experiment, and the agreement is much better than can be obtained by the usual 
renewal methods, i.e., assuming that all interval lengths are independent. (Received Febru- 
ary 10, 1958.) 


18. Minimal Complete Classes of Tests. D. L. BurKHOLDER, University of 
Illinois. 


Minimal complete classes of tests are found for a number of common testing problems 
including, for example, those listed by Lehmann and Scheffé in Sankhya, Vol. 15 (1955), 
p. 224, with respect to the exponential family of distributions. The proofs are based partly 
on the theory of complete and sufficient statistics and partly on other ideas needed and de 
veloped for those cases in which the hypothesis set w and the alternative set 2 — w are 
separated by an indifference zone. The kinds of results obtained are illustrated in the fol- 
lowing special case: Let X, and X2 be independent random variables where X; is binomial 

ni, pi),0 < ps < 1,7 = 1, 2. Let w be a subset of {(p: , pe) | p: = po}, 2 — w be a subset 
of {(p: , p2) | pi < pe}, and suppose there are positive numbers m and VM such that if 


¥ 


S = {(pi, p2) | m < pi/p2 < Mj 


then each of SN w and S N (Q — w) has the origin as a limit point. Let C be the class of 
all tests » of the form: ¢(z; , x2) = Lif xz; < c(z: + 22), = a(x; + 22) if equality holds, = 0 
otherwise. Then C is the minimal complete class of tests for the problem of testing (pi , p2) ¢€ 
w against (pi: , po) € 2 — w. Thus, both Fisher’s exact test and the classical test are admis- 
sible since both are in C. (Received February 11, 1958.) 


19. On the Fitting of Some Contagious Distributions, =. K. Karri and Joun 
GURLAND, Iowa State College. 


A number of compound and generalized distributions are compared by using such char- 
acteristics as skewness, kurtosis, and the ratio of the first two frequencies. A study has also 
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been made of the limiting forms of the distributions. Some of these distributions have been 
fitted to sampled data by estimating the parameters by various methods in order to gain 
some empirical knowledge of the usefulness of these distributions and the relative merits 
or demerits of the methods of estimation. (Received February 12, 1958.) 


20. Notes on the Spearman-Karber Procedures in Bioassay. (Preliminary 
Report) Byron Wa. Brown, Jr., University of Minnesota. 


The maximum bias of the Spearman-Karber estimator of the L.D. 50 over possible choices 
of dose levels is examined under various conditions on the distribution function, such as 
unimodality and symmetry. The maximum mean square error of the estimator is examined 
also. The results are compared with actual values for several distributions. The results are 
also used to make some comparisons of the Spearman-Karber estimator with some com- 
monly used parametric methods of estimating the L.D. 50. (Received February 12, 1958.) 


21. Biases in Prediction by Regression for Certain Incompletely Specified 
Models. Haro_p Larsen, Iowa State College, (transmitted by H. T. 
David). 


An experimenter doesn’t know whether to assume a ‘“‘full’’ population regression model 

E(y;) = a8 8.2,;; or a “partial”’ population regression model E(y;) = Diu Bizi; , k =m 

He decides the matter by the natural preliminary F-test of the hypothesis that the last 

k — m)§,’s are zero. He uses the full model for subsequent predictions if the hypothesis is 

rejected, and uses the partial model for subsequent predictions if the hypothesis is not re- 
jected. Call this predictor y*. 

The full model is assumed to be true, the error terms being normally distributed with 
zero mean. Under these assumptions the expected value of the estimator y* is derived. The 
expected value of the estimated variance of y* is also derived if a certain sometimes-poo! 
procedure is used. (Received February 12, 1958.) 


22. Independence of Statistics and Characterization of the Multivariate Normal 
Distribution. 8. G. GuuryeE, University of Chicago and Ingram Olkin, 
Michigan State University, (By Title). 


Some of the results proved are: If z, y are independent p-dimensional random vectors 
and A is a non-singular matrix such that z + y and z + Ay are independent, then z, y are 


normal. If x; , --+, 2, are independent random vectors, A; , ---, An, Bi, ---, Ba are non 
inguls _ atiwea av 1c < ices suck hes 7, and S& B are ind 

singular commutative symmetric matrices such that 2 Aiz; and 2 B;z; are independent, 
then the z; are normal. If fi(t), ---, fn(¢) are e.f.’s and there exist positive numbers a , ---, 


a, such that in some neighborhood of the origin J] f;*:(t) agrees with an entire function 
of finite order p, where p is larger than the exponent of convergence of the zeros of the 
function, then p cannot exceed 2. This is applied to characterize the normal distribution 
by the independence of a sum of independent r.v.’s (not all of which need be identically 
distributed) and a polynomial of special, specified type. 


Let 2: ,--- , 2, be independent p-variate normal random vectors. Let z; = (tj: ,--- , Zjn) 

, “ww - . c 1) a , ’ 2 ‘ ' 
NASC are given for the independence of (1) gi;' = 2:Asj2j + 21@¢ + 2,0; and qi; = 2,:By;2; + 
, . , 9 ’ . ’ ° 7 —a 
ziby + 2,b; . (2) (qi;) and J Aix, and (3) D Aya; and D B,z; . (Received February 4, 1958.) 


23. Contributions to the Theory of Rank Order Statistics--The One-Sample 
Case. I. RicHarb SAVAGE, University of Minnesota. 


The testing that a distribution has median zero against slippage is considered using the 
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techniques developed earlier (these Annals, Vol. 27 (1956) pp. 590-615, and Vol. 28 (1957) 
pp. 967-977). Let Z = (Z, , +--+ , Zw) be a random vector with Z; = 1(0) if the ith largest 
in absolute value in a sample of N from the density f(z) is positive (negative). Then 


Conditions are found implying P(Z = z) > P(Z = 2’) where z is derived from z’ by replac 
ing a0 by a1, or interchanging a0 and 1 in z’ by moving the 1 to the left. These conditions 
are met by the normal and other symmetric exponential distributions. (Received February 
17, 1958.) 


24. An Identity of Use in Non-Linear Least Squares. M. Bb. Wink, Bell Tele- 
phone Laboratories. 


Under rather general conditions the identity f(z) = f(xo) + (x — 2o)f’[(z + 20)/2] is a 
necessary and sufficient condition that f(z) be a quadratic function. The identity gener- 
alizes immediately and in the same form to p variables. A procedure due to Gauss for iter- 
ative non-linear least squares fitting of observations y; to a function f(x; ; @), involves 
essentially the repeated linear regression of [y; — f(r; ; 60)] on [Af(zi ; 6)/00),.., with the 
regression coefficient § giving ‘“improved”’ estimates of 6 by (@> + $). The generalization 
to p parameters is immediate. 

This process can oscillate wildly (for example, out of computer range) and does not 
necessarily converge. A modification of this ‘Linear Gauss’’ procedure, based on the 
identity above, will approximate a ‘‘Quadratie Gauss’’ procedure while always solving 
only sets of linear equations. Advantages are a damping of the oscillations of the Linear 
Gauss, possible decrease in the extent of computing, and possible improvements in con- 
vergence characteristics. (Received February 17, 1958.) 


25. Unbiased Regression Estimators. W. H. Wiuuiams, Iowa State College. 


In sample surveys one desires unbiased estimators of population characteristics such as 
the mean Y of a variate y, and that these estimates be made with good precision. There 
are many ways of improving precision, one of which is the use of auxiliary information. In 
particular, this information is sometimes used in a regression estimator obtained by evalu- 
ating the line of best fit at the point ¥. The properties of this estimator are derived from 
the stochastic mode! y; = A + Br; + e; where the e; are random errors which have expec- 
tation zero, common variance and are uncorrelated with each other. The estimator % of 


the population mean Y is then of the form % = 7 + b(X — Z) where 7 and # are sample 


means and b is the least squares estimator of the regression slope. If the paired observa- 
tions y;z; satisfy the above linear model then % has expectation Y. However, it is often 
unrealistic to assume that such a model is satisfied by the data and in such an event % will 
usually be biased. For large populations the expectation of % is given by Y — Cov(zb) so 
that % has a bias of —Cov(z#b). Cov(#b) refers to the joint distribution of # and b in ran- 
dom samples of size n. An unbiased estimator of Y is obtained which has favorable effi- 
ciency. This estimator is easily generalized to the multivariate situation. (Received Febru- 
ary 20, 1958.) 


26. Maximum Likelihood Estimation from Incomplete Data for Continuous 
Distributions. Scorr A. Krane, Iowa State College. 


A method is given for obtaining the maximum likelihood estimates of parameters of con- 
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tinuous distributions from sample data which is ‘‘incomplete’’ due to truncation, censoring 
or grouping. The method may be applied to any distribution for which the likelihood equa- 
tions are soluble in the complete data case. No special functions are required. 

The likelihood equations for incomplete samples contain two types of terms: (a) the 
differentials of the likelihood evaluated at observed variate values z; , and (b) integrals of 
the above differentials over intervals of missing variate values. The method presented re- 
places the integrals in (b) by weighted sums of terms similar to (a) evaluated at variate 
values, z, , “‘representative’’ of the intervals of missing values. The likelihood equations 
for the incomplete sample are then identical with those for a complete sample of values z,; 
and zs. The z, values and weights required are functions of the parameters, so that an 
iterative procedure is used to obtain the estimates. (Received February 26, 1958.) 


27. Unbiased Ratio Estimators in Stratified Sampling. Jose Niero pe PascvuaALt, 
(transmitted by W. H. Williams). 


The paper presents some theory of unbiased ratio estimators of the population mean Y, 
in stratified sampling, computed from samples of k drawn from each of L strata (k < L). 

Two unbiased ratio estimators and their exact variances, as well as unbiased estimates 
of the latter, are given. The derivations follow the lines of an unbiased ratio estimator for 
simple random sampling, y’, introduced by Hartley and Ross (Nature (174), August 7, 1954, 
p. 270). The two estimators are (a) An unbiased ‘‘separate’’ ratio estimator formed by ob- 
taining the y’ estimator for each stratum, and (b) An unbiased ‘‘combined”’ ratio estimator 
computed by the y’ formulae from k pairs of jj. , Z.: , where + , Z,¢ are the familiar un- 
biased estimators of Y, X, computed from stratified samples of one unit drawn from each 
of the L strata. 

These two unbiased estimators are then compared with the ‘‘combined’’ ratio estimator 
(Hansen, Hurwitz, and Gurney, J. Amer. Stat. Assn., Vol. 41 (1946), pp. 173-189), and con- 
ditions on the population characteristics are described when the unbiased estimators are 
more efficient. Generalizations and the special case k = 2 are discussed in detail. (Received 
February 27, 1958.) 


28. On the Laws of Cauchy and Gauss. R. G. Lana, The Catholic University of 
America. 


The following theorems are proved: THEOREM 1. Let z and y be two independently and 
identically distributed random variables having a common distribution function F(z). Let 
the quotient w = z/y follow the Cauchy law distributed symmetrically about the origin. 
Then F(z) has the following general propeties: (1) it is symmetric about the origin, ab- 
solutely continuous, and has a continuous probability density function f(z) = F’ (xz); (2) the 
random variable z has an unbounded range; (3) the probability density function f(z) satis- 
fies the equation {9 f(z)f(wr)z dz = co/(1 + w*) holding for all w, where co is a constant. 
THEOREM 2. In addition to the conditions of Theorem 1, let F(z) have finite moments of 
all orders. Then F(z) is normal. (Received February 28, 1958.) 


(Abstracts of papers presented at the Gatlinburg, Tennessee Meeting of the Institute, 
April 10-12, 1958.) 


29. On the Simple von Neumann Model of Dynamic Economic Equilibrium 
as a Markov Chain. (Preliminary Report) Davin Rosensiatt, American 
University, (By Title). 


The simple von Neumann model of dynamic economic equilibrium (the special case in 
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which (i) there is the same number of “‘goods”’ as of basic ‘“‘productive processes’’ and (ii) 
there is a single ‘‘output”’ for each ‘“‘productive process’’) is simply transformed and struc- 
turally related to two stationary Markov chains. Results are obtained for aggregation and 
consolidation in the simple von Neumann model and these are compared and contrasted 
with analogous results for macro-statistical input-output formulations. (Received Febru- 
ary 5, 1958.) 


30. Tests on a Variance-Covariance Matrix. NATHAN MANTEL, National Cancer 
Institute. 


A class of tests on the elements of the variance-covariance matrix is proposed. The class 
includes as a special case Pitman’s Test for equality of two correlated variances. Depending 
on the assumptions made the test may be one for uniformity, for equality of variances or 
for equality of covariances. The test may be adapted so as to provide more specific con- 
trasts. Tests on the corresponding correlation matrix through the use of either empirical 
or population standardizing factors are also considered. 

An interesting adaptation of the procedure is one which permits testing the interaction 
in a two-day classification in the absence of replication. 

The testing procedures depend on the fact that when the row sums of the variance-co- 
variance matrix are equal the mean of a set of observations is uncorrelated with any of the 
deviations from the mean. The test is primarily one on the significance of the multiple cor- 
relation of the mean on the set of deviations. The power efficiency of the test for specific 
alternatives may be increased by testing the correlation of the mean with a subtest of devi- 
ations or linear combinations of deviations. Efficiency may also be increased by shifting 
attention from the original variables to linear transforms. In someinstances a single change 
in sign of some of the variables can increase efficiency. (Received February 10, 1958.) 


31. An Upper Bound for the Variance of Certain Statistics. WasstLy Horrrpina, 
University of North Carolina. 


It is shown that if XY, , X2, X; are independent and identically distributed random vari- 
ables, if O S f(X, , Xe) = f(X2,X1) S 1, and Ef(X,; , X-) = p, then Ef(X, , X2)f(X: ,X3) — 


p? = H(p), where H(p) = p*? — p?,4 S p S 1, and H(p) = (1 — p)*? — (1 —p)?,0 S$ 
p = 3. The sign of equality holds if, with probability one, f(X: , X2) = g(X.)g(X:) (for 
p = 4) orf(X; , X2) = 1 — g(X1)g(X-) (forp S 4), where g(X) takes the values 0 and 1 only. 
This inequality implies an upper bound for the variance of the statistic 


U = Disixis nf (X; , A;)[n n — 1)}° 


j 


in terms of its mean. This class of statistics includes M. G. Kendall’s rank correlation 
coefficient ¢ and (except for a minor difference) the Cramér-von Mises statistic w*. In the 
former case the inequality has been conjectured by Daniels and Kendail. (Received Febru- 
ary 12, 1958. 


32. Ona Test for the Equality of Several Means. K. V. Ramacuanpran, Demo- 
graphic Training and Research Centre, India, (By Title). 


Let 2i;(@ = 1,2, --- ,k;j7 = 1,2, --- ,n) be random samples of sizes n from & univariate 
normal populations with means y; and variance o?(i = 1,2, --- ,k). The hypothesis Ho: 4; = 


wo = +++ = wy against H:Not Hp is equivalent to the union of Hoitui = uw (say) against 
H ;:y:i ~ w (where wis unknown) foreveryi = 1,2,--- ,k. Totest Hoitu: = w against Hy: + 
uw for any given z we have the test based on t; = [(%; — #)/S][nk/(k — 1)]'/? where nz; = 
sn 


n 2 


A eS ~ k ee . 
Dyn Liz, KE = Diner % and k(n — 1)S? = Dior Du (zi; — #:)?. We accept Ho; against 
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H; for any given 7 if | t; | S ta where ta is the upper a/2 per cent point of Student’s ¢ 
distribution with k(n — 1) d.f. Hence we accept Ho against H ¥ Hz, iff (if and only if) 
max; | t; | S ta, i.e., iff max; | [(%; — £%)/S][nk/(k — 1)]'?| S tan. This two-sided ver- 
sion of the Nair Statistic provides an alternative test in the analysis of variance situation 
and gives simultaneous confidence bounds on all uw; — w(t = 1, 2, --- , k) with a confidence 
coefficient 1 — a. Power properties and multivariate and other generalizations of these 
tests are being investigated. (Received February 14, 1958.) 


33. On a Test for the Equality of Several Variances. K. V. RAMACHANDRAN, 
Demographic Training and Research Centre, India, (By Title). 


Let z:;(¢ = 1,2, --- ,k;7 = 1, 2, --- , n) be random samples of sizes n from k univariate 
normal populations with means 4; and variances o; (i = 1, 2, --- , k). The hypothesis 
Ho:oi = ot = «++ = of against H:Not Hj is equivalent to the union of Hoii0% = o? (say) 
against Hi:oi * o? (where o? is unknown) for every i = 1, 2, --- , k. To test Hy:0% = 0? 
against H;:0% ¥ o? for any given 7 we have the test based on F; = Si / Di: Si where 
n= 5, = 32. Zi; — %;)*l #7. We accept Ho; against H; for any given 7 if F; = 7.5 


F; where PriF; < F; | Hoi} + Pr{F; = F:\ Hox} = a and F; has an F distribution with 
(n — 1), (k — 1)(n — 1) df. Hence we accept Hy against H ¥ Hp iff Fi S Fain S Fmax S 


/ . +2 k 2 , 72 ik 5 v2 : om 
F; , WwW here F, ec = Sm r int .t mir Si and F mas = Smax Dokadst gl max Si a ooo iff 
' v2 ¥2 v2 she v2 ome ° . : cde 
9: S Smin/Dier Si S Sirax/Dier Si g2 . This two-sided version of Cochran’s statistic 


provides an alternative test in the homogeneity of variances situation. The distribution 
problem, power properties and multivariate and other generalizations of these tests are 
being investigated. (Received February 14, 1958.) 


34. An Optimum Property of Some Bechhofer-Type Non-Sequential Multiple- 
Decision Rules. Wm. Jackson Hai, University of North Carolina. 


R. E. Bechhofer has proposed a single-sample multiple-decision procedure for ranking 
means of normal populations with known variances, and, with M. Sobel, a procedure for 
ranking variances of normal populations (Ann. Math. Stat., Vol. 25 (1954), pp. 16-39 and 
273-289). We assume that the sample sizes for the populations are equal, and, in the first 
case, that the variances are equal. Their rules guarantee a correct ranking with prescribed 
probability when the population parameters are sufficiently distinct (in a prescribed way) 
This paper proves that no other rules can accomplish this with a smaller sample size; that 
is, their rules are ‘‘most economical’’. This is not true if the sample sizes are unequal, but 
it is true for any analogous procedure for ranking populations according to a parameter 
when, for each sample, there is a numerical sufficient statistic with a monotone likelihood 
ratio and the parameter is a location or scale (but not range) parameter in the distribution 
of the statistic. These results are obtained from application of ‘‘most economical decision 
theory’? (Ann. Math. Stat., Vol. 25 (1954), p. 814). (Received February 17, 1958.) 


35. Second Order Rotatable Designs in Three or More Factors. R. C. Bose 
and Norman R. Draper, University of North Carolina. 


Previous attempts to obtain second-order rotatable designs for three factors made use 
of the regular figures in three dimensions. A new method that has been successfully de- 


veloped employs various sets of points which satisfy the conditions: > z? = > y? = 
T2,Sxe=Dy= DA, Dd zy? = D y*z? = J zz, all odd moments up to and in 


cluding order four being zero. The basic sets may be combined in various ways to give a 


number of infinite classes of rotatable designs, each class dependent on a parameter. This 
<e any value in a specified range, which depends only on the number of 


parameter may tal 
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points in the design. By giving specific values to the parameters in the various classes, all 
of the second-order designs suggested in the Institute of Statistics Mimeo Series No. 149 
by D. A. Gardiner, A. H. E. Grandage, and R. J. Hader were derived as special cases. An 
example of such a class is as follows. The N = 20 + mo points (+a, +a, +a); (+c: , 0, 0), 
(0, +e: , 0), (0,0, +e1); (Hee , 0,0), (0, +c2 , 0), (0,0, +c2); (0,0, 0) (no times) where a = 
{N — 8a? + [N (16a? — N)]'/2}/12 (¢ = 1, 2) form a second-order rotatable design if .0738N = 
a? => .0625N. When cz = 0, the well-known cube and octahedron design, with center points, 
is obtained. The method has been used additionally to construct designs for both higher- 
order rotatability and higher dimension (number of factors). (Received February 17, 1958.) 


36. A Markov Chain Resulting From a Certain Sorting Problem. A. Bruce 
CLARKE, University of Michigan. 


Consider the following sorting problem: Objects are chosen consecutively from an infi- 
nite population consisting of r different categories in proportions pi, p2,-- , Pp Pi = 1. 
The objects chosen are sorted by category and placed in r piles. Periodically one of the 
categories 1, 2, --- , ris selected at randem with probabilities q: , q2,-:-,@, 2 q4 = 1, 
and the pile of elements of the selected category is removed from the system. Denoting the 
number of elements in the 7th pile immediately preceding the ¢th pile removal by z;; , the 
distribution of the random vector 2; = (r4; , «++ , tr) is studied as t— ~«. This forms a sta- 
tionary Markov chain. The limiting distributions of the individual components z;; ,t— ~, 
are obtained explicitly, and a recursion formula is established which leads to the limiting 
distribution of z, . One result is that the mean total number of individuals in the system at 
any time, E[D{.: z::], is minimized if the probabilities q; are chosen proportional to ~/p, . 
(Received February 17, 1958.) 


37. Fitting the Logistic by Maximum Likelihood. J. L. Hopcss, Jr., University 


of California, (By Title). 


A method is presented by means of which the maximum likelihood estimates of the 
logistic response function may be quickly obtained to graphical accuracy, without the use 
of a computing machine or special charts. The basic idea is to replace the observed response 
numbers by equivalent ones for which the estimates are obvious. (Received February 19, 
1958.) 


38. Useful Bayes Solutions for Multiple Comparisons Problems. I. (Preliminary 
Report) Davin B. Duncan, University of North Carolina. 


A Bayes solution is developed for the common t-test problem of testing the hypothesis 
6 S O against the alternative @ > 0 given observed values of z and s where z is normally 
distributed with @ as mean and variance o? and s? is an independent estimate of o? dis- 
tributed as xi0?/ v. The ultimate objective is to solve many forms of multiple comparisons 
problems generated by the restricted products (Lehmann, Ann. Math. Stat., 1957, pp. 1-25) 
of problems of the given form, the Bayes solutions to be obtained as corresponding products 
of solutions of the form developed. The loss function assumes losses proportional to | @| , 
the factor for type I errors being k times that for type II errors, k = 1. The Bayes function 
is a normal density with mean 0 and variance y*e?. These functions fit, at least to a satis- 
factory degree of approximation, a wide variety of problems met in practice. The solution 
(restricted to invariant procedures) has the critical region z/s > t where ¢ is a function of 
the degrees of freedom », loss ratio k and dispersion ratio y?. A brief table of ¢ with these 
three arguments is presented. (Research jointly supported by the U.S. Public Health Serv- 
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ice and by the U.S. Air Force through the Office of Scientific Research of the Air Research 
and Development Command.) (Received February 20, 1958 


Abstracts of papers presented at the Cambridge, Massachusetts Meeting of the Institute 
August 25-30, 1958.) 


39. Determining Bounds on Integrals with Applications to Cataloging Problems. 
BERNARD Harris. 


Assume that a random sample of size N has been drawn from a multinomial population 
with an unknown and perhaps countably infinite number of classes. The experimenter 
wishes to predict d(a), the number of classes that will be observed in a second sample of 
size aN, a > 1, (or when the sample size is increased by (a — 1)N additional observations) ; 
and C(a), the coverage of a second sample (or augmented sample), where C(a) = D p; , 
the sum is to be taken over those classes for which at least one representative has been 
observed in the sample. It is shown that Ed(a) ~ d + nE{[1 — e~‘*-?]/z}, and EC(a) ~ 
1 — (n:/N) + (ni/N)E{1 — e~‘*-)2} where d is the number of classes observed, n: is the 
number of classes occurring once in the sample, and the expectation is taken with respect to 
a distribution function unknown to the experimenter, but estimates of the moments are 
available. Hence a reasonable procedure is to compute upper and lower predictors of d(a) 
and C(a) by determining the suprema and infima of the above expected values subject to 
moment constraints. 

Several results are given concerning bounds on integrals subject to moment constraints, 
and a method of determining the sharpest bounds is shown. The explicit solutions are com- 


puted for 0, 1, 2,3 moment constraints and applied to several examples. (Received January 
23, 1958.) 


40. Single Server Queuing Processes with a Finite Number of Sources. GeraLp 
Harrison, The Teleregister Corporation. 


A service system is considered which consists of a single server and a finite number of 
sources. The sources are assumed to be non-interacting and to have the same negative ex- 
ponential idle time distribution. The service time is assumed to have an arbitrary distribu- 
tion with a finite mean. There are no defections from the waiting line, and the service time 
is independent of the length of the waiting line. The stationary behavior of this service 
system is studied. The relations between load factor, mean delay, mean service time, mean 
source idle time, and proportion of calls delayed are obtained. The length of the waiting 
line at instants of termination of service is a Markov chain and its stationary distribution 
is thus reduced to solving a system of linear equations which, because of the form of the 
transition matrix, reduces to a simple iterative procedure. Under the assumption of the 
queuing discipline of service in the order of arrival the waiting time distribution is obtained. 


These results are specialized to the cases of constant and negative exponential service time 
distributions. (Received February 13, 1958.) 





NEWS AND NOTICES 


Readers are invited to submit to the Secretary of The Institute news items of interest 


Personal Items 


Gerald D. Berndt has taken a position as Mathematician at Headquarters, 
Strategic Air Command. He is Deputy Chief of Programming and Consultant 
in mathematics and statistics to the Commander in Chief in the Directorate of 
Operations. 

Dr. John R. Bowman has left his post as Director of Research for Mellon 
Institute in Pittsburgh to take up the duties of the new Associate Dean of the 
Technological Institute at Northwestern University in Evanston, Illinois. 

Robert J. Buehler, formerly instructor in mathematics and project associate 
at the Naval Research Laboratory, University of Wisconsin, has been appointed 
Assistant Professor in the Department of Statistics and the Statistical Labora- 
tory at Iowa State College, Ames, Iowa. 

Mavis Carroll has moved with General Foods Research Center to their new 
laboratories in Tarrytown, New York. She was recently named Section Head 
in the Product Evaluation Division, which includes responsibility for subjective 
testing and statistical services. 

Eugene Crystal is now employed by Smith Kline & French Laboratories in 
Philadelphia, Pennsylvania. Mr. Crystal received an M.S. degree from Rutgers 
University in 1957. 

Dr. Herbert A. David has resigned from the University of Melbourne to 
accept an appointment as Professor of Statistics at the Virginia Polytechnic 
Institute. 

Dr. Olive Jean Dunn has resigned her position as Assistant Professor of Sta- 
tistics at Iowa State College, Ames, Iowa, and has accepted a position as Assis- 
tant Professor of Biostatistics, University of California at Los Angeles. 

Carl H. Fischer, Professor of Actuarial Mathematics, University of Michi- 
gan, has been by Secretary of Health, Education, and Welfare Marion B. Folson 
to a 12-member advisory council which is to review the long-range financial 
position of the Social Security System. 

Dr. John E. Freund, formerly of the Department of Statistics of Virginia 
Polytechnic Institute, is now connected with Arizona State University. His 
new residence is 7532 E. Holly, Scottsdale, Arizona. 

Professor Bernard Friedman, formerly of the Institute for Mathematics and 
Mechanics at New York University, has accepted a position as Professor of 
Mathematics at the University of California at Berkeley. 

Paul Gunther has taken a leave of absence from the Statistical Methods 
Section of the General Electric Company and is now at the University of Chicago 
working on his Ph.D. degree. 
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Dr. Robert G. Hoffman has resigned from his position as Statistician for the 
Commission on Professional and Hospital Activities and has accepted an ap- 
pointment as Statistician for the J. Hillis Miller Health Center and Assistant 
Research Professor of Statistics, Statistical Laboratory, University of Florida. 

Donald F. Mills has completed requirements for the Ph.D. degree at the 
University of Washington and is now an Assistant Professor of Education at 
Arizona State College, Tempe. 

Sidney I. Neuwirth is with Johnson & Johnson, New Brunswick, New Jersey, 
as Manager of Operations Research. He joined them in December, 1956, to 
organize and administer a company-wide O.R. program. 

John W. Pratt is now Assistant Professor in the new Department of Statistics 
at Harvard University. 

Lt. (j.g.) F. Beckley Smith, Jr., expects to be released from active duty with 
the U. S. Navy in February, 1958. His new address will be 264 Atlanta Drive, 
Pittsburgh 28, Pennsylvania. 

Earl A. Thomas has accepted a position as Technical Advisory to the Manager, 
Ballistic Missiles Division, Burroughs Research Center, Paoli, Pennsylvania. 

John Tukey is in residence at the Center for Advanced Study in the Behavioral 
Sciences at Stanford, California, as a Fellow. He will return to Princeton Uni- 
versity and Bell Telephone Laboratories in September, 1958. 

Pearl A. Van Natta, formerly with the Denver Research Institute, has ac- 
cepted the position of biostatistician with the Child Research Council in Den- 
ver. 

Robert F. White has been chosen to receive the George W. Snedecor Award 
in Statistics for 1958 at Iowa State College by vote of the graduate faculty in 
statistics. The award is given annually to the person judged to be most out- 
standing among students at the College working toward a Ph.D. or joint Ph.D. 
in statistics who are expected to graduate within a specified period of time. The 
award consists of a year’s membership in The Institute of Mathematical Statis- 
tics together with a subscription to that professional society’s journal. White is 
currently employed as a full-time associate on statistical problems in the Agri- 
cultural Experiment Station. 

William H. Williams, formerly on a Canadian Research Council fellowship, 
has been appointed Assistant Professor in the Department of Statistics and the 
Statistical Laboratory at Iowa State College, Ames, Iowa. 

Leroy F. Wolins, formerly associate director of research, Science Research 
Associates, Chicago, has been appointed Assistant Professor in the Department 
of Statistics and Psychology and the Statistical Laboratory at Iowa State Col- 
lege, Ames, Iowa. 

G. Stanley Woodson has accepted an appointment as Biostatistician for the 
Commission on Professional and Hospital Activities in Ann Arbor, Michigan. 
The new post is supported, in part, by a grant from the W. K. Kellogg Founda- 
tion. 
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Summer Institute on Nonparametric Statistics 


With the financial support of the National Science Foundation, the Institute 
of Mathematical Statistics is planning a Summer Statistical Institute (SSI) at 
the University of Minnesota (School of Business Administration) from June 16 
to July 26. The SSI is being organized by a committee appointed by the IMS 
consisting of J. L. Hodges, Jr., W. Hoeffding, W. Kruskal, and I. R. Savage. 
The committee is planning to invite several research workers, with major and 
continued interests in nonparametric methods, to participate. The Plan of the 
SSI is to have also members who are just beginning their research work or who 
have not in the past concentrated in the field of nonparametric inference, for 
example individuals who are completing their graduate work involving research 
in nonparametric statistics as well as other research workers whose interests are 
now moving towards nonparametric statistics. 

The funds allotted by the National Science Foundation permit support to 
some advanced graduate students who are interested in pursuing their own prob- 
lems in the area of NP methods. The funds may also permit the paying of travel 
expenses for individuals who are supported in other ways. Also the institute will 
welcome a few other workers who do not need any financial support. 


I 


Summer Offerings in Statistics at Iowa State College 


The Department of Statistics at Iowa State College will offer six applied 
courses in statistical theory and methods in its two 1958 summer sessions. These 
courses are planned primarily for graduate students or research workers with 
limited mathematical backgrounds who wish to use statistical techniques intelli- 
gently for application to other fields. In addition, a course on special topics in 
theoretical or applied statistics may be studied at the graduate level. Senior 
staff members will be available during most of the summer for consultations on 
research or special problems. 

Students may register for either or both of the six-week summer sessions: 
June 17-July 23 and July 23—August 29. The complete list of statistics offerings 
for the first session is as follows: Stat. 401, “Statistical Methods for Research 
Workers” (at the level of Snedecor’s Statistical Methods); Stat. 447, “Statistical 
Theory for Research Workers’ (mainly theory of experimental statistics at the 
level of Anderson and Bancroft’s Statistical Theory in Research; Stat. 599, “‘Spe- 
cial Topics”; and Stat. 699, “‘Research’’. In the second session will be offered 
Stat. 402, a continuation of 401; Stat. 448, a continuation of 447; two courses in 
applied methods which are more specialized, Stat. 411, “‘Experimental Designs 
for Research Workers”’, and Stat. 421, ‘““‘Survey Designs for Research Workers’’; 
and finally Stat. 599 and 699. Additional information may be obtained from 


T. A. Bancroft, department head and Director, Statistical Laboratory, Iowa 
State College. 
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Rumanian Membership 


The application of Rumania for membership in the International Mathe- 
matical Union (Group II) has been approved by unanimity of the voting nations. 
The membership of Rumania became effective on March 1, 1958. 


Ce 


NAS-NRC Chairmanship 


Detlev W. Bronk, president of the National Academy of Sciences, has an- 
nounced the appointment of Samuel 8S. Wilks, professor of mathematical statistics 
at Princeton University, as the new Chairman of the Mathematics Division of 
the National Academy of Sciences-National Research Council. Dr. Wilks suc- 
ceeds Paul A. Smith, professor of mathematics at Columbia University, who 
served as Division chairman since 1955. As Chairman, Dr. Wilks will supervise 
the continuing activities of the Division advisory to the Federal government in 
matters pertaining to mathematics. The Academy-Research Council, a private 
organization of distinguished scientists dedicated to the furtherance of science 
and its use for the general welfare, is authorized by the Federal government to 
act, upon request, as an official adviser in all matters of scientific and technical 
interest. In addition, the Mathematics Division serves as a clearing house for 
information of concern to mathematicians throughout the United States. It 
constantly strives to promote mathematical research and to improve the teach- 
ing of mathematics at all levels. As part of the Academy-Research Council, 
whose eight divisions embrace all the natural sciences, the Mathematics Division 
has a unique opportunity to develop productive interchange between mathe- 
maticians and other scientists. 


a 


New Members 


The following persons have been elected to membership in The Institute 
November 1, 1957, to February 3, 1958 


Abernathie, Donald H., M.S. (Univ. of Illinois), Research Engineer, Convair Astronautics, 
San Diego, California; 5335 Via Bello, San Diego 11, Calif. 

Adkins, George B., M.A. (Univ. of Missouri), Chief, Mathematical Statistics Branch, Di- 
vision of Nuclear Materials Management, Atomic Energy Commission, 1901 Consti- 
tution Ave., Washington 25, D. C.; 1111 Arlington Blvd., Arlington, Virginia. 

Behnken, Donald W., M.B.A. (Columbia Univ.), Graduate Assistant, Institute of Statis- 
tics, North Carolina State College, Raleigh, North Carolina; 1631 Van Dyke Ave., Ra- 
leigh, N.C. 

Bell, Alan Edward, M.S. (Stanford Univ.), Research Assistant, Applied Mathematics and 
Statistics Laboratory, Stanford University, Stanford, California; 1024 Ramona, Palo 
Alto, Calif. 

Bennett, William S., M.A. (Duke Univ.), Operations Analyst, Operations Research Office, 
7100 Connecticut Ave., Chevy Chase, Md. 
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Berman, Simeon M., B.A. (City College of New York), Lecturer, Dept. of Mathematics, 
The City College of New York; 1086 President St., Brooklyn 25, N.Y. 

Bhapkar, Vasant Prabhakar, M.S. (Bombay Univ.), Lecturer in Statistics, Univ. of Poona, 
Poona7, India; Dept. of Statistics, Univ. of North Carolina, Chapel Hill, North Carolina. 

Bird, Marion T., Ph.D. (Univ. of Illinois), Professor of Mathematics, San Jose State Col- 
lege, San Jose 14, Calif.; 45 Pala Ave., San Jose 27, Calif. 

Christensen, Inge F., M.A. (Catholic Univ.), Student, Catholic University of America, 
Michigan Ave., N.E., Washington 17, D. C.; 2725 29th Street, N.W., Washington 8, D.C. 

Coyne, Lolafaye, M.A. (Univ. of Kansas), Statistician, Research Dept., The Menninger 
Foundation, 2617 W. 6th, Topeka, Kansas. 

Dembiczak, Cecilia M., M.S. (Univ. of Connecticut), Research Analyst, United Aircraft 
Corporation, East Hartford, Connecticut; 35 Potter Road, North Haven, Conn. 

Denton, James Q., B.S. (California Institute of Technology), Research Assistant, Depart- 
ment of Mathematics, University of Oregon, Eugene, Oregon. 

Deshpande, J. V., M.Sc. (Univ. of Poona), Lecturer in Statistics, Govt. College of Science, 
Nagpur (Bombay State), India. 

Dongwoo, Lee, Student, College of Liberal Arts and Sciences, Seoul National University, 
Seoul, Korea; *101-2 Hioza-dong, Chongno-gu, Seoul, Korea. 

Doyle, Francis J., Jr., B.S. (Manhattan College), Graduate Assistant, Case Institute of 
Technology, Cleveland 6, Ohio; 2099 Abington Road, Cleveland 6, Ohio. 

Fisz, Marek, Doctor (Univ. of Wroclaw), Extraordinary Professor, University of Warsaw, 
Mathematical Institute of the Polish Academy of Sciences; ul. J. Dabrowskiego 81/51, 
Warsaw, Poland 

Foster, Frederic Gordon, D.Ph. (Oxford), Reader in Statistical Computing, London School 
of Economics and Political Science, Houghton Street, Aldwych, London W. C. 2, 
England. 

Kada, Jimmy Masakazu, B.S. (Univ. of California, Los Angeles), Student, School of Public 
Health, University of California at Los Angeles, 405 Hilgard Ave., Los Angeles 24, 
Calif.; 248 N. Burlington Ave., Los Angeles 26, Calif. 

Kikuchi, Susumu, B.E. (Keio-Gijuku Univ.), Assistant, The Institute of Polytechnics, 
Osaka City University; Minamiogimachi, Kitaku, Osaka, Japan. 

Kimme, Ernest G., Ph.D. (Univ. of Minnesota), 1E237, Bell Telephone Laboratories, 
Murray Hill, New Jersey. 

Kishen, K., Ph.D. (Univ. of Lucknow), Chief Statistician to Government, U. P., Depart 
ment of Agriculture, Chhota Chhattar Manzil, Lucknow, India. 

Krishnaiah, Paruchuri R., M.A. (Univ. of Minnesota), Research Assistant, Bureau of Edu- 
cational Research, University of Minnesota, Minneapolis 14, Minnesota. 

Laha, Radha Govinda, D.Ph. (Calcutta Univ.), Research Associate, Dept. of Mathematics, 
Catholic University of America, Washington 17, D. C. 

Levenbach, George J., M.E.E. (Univ. of Technology, Delfd), Statistician, Bell Telephone 
Laboratories, Inc., Murray Hill, New Jersey. 

Lewis, Charles T., B.S. (Univ. of Wyoming), Analyst, Geotechnical Corporation, 114 
Grand Avenue, Laramie, Wyoming; 29 Wainwright, Laramie, Wyoming. 

Li, C. C., Ph.D. (Cornell Univ.), Assistant Professor, Dept. of Biostatistics, Graduate 
School of Public Health, University of Pittsburgh, Pittsburgh, Pa. 

Murthy, Vrudhula Krishna, B.S. (Bombay Univ.), Research Assistant, Dept. of Statistics, 
University of North Carolina, Chapel Hill, North Carolina. 

Oldmixon, E. Robert, Engineer, Quality Control Dept. (H-34), Surface Armament Division, 
Sperry Gyroscope Co., Great Neck, L. I., New York. 

Popper, Juliet, Ph.D. (Stanford Univ.), Research Fellow in Psychology, Department of 
Psychology, Indiana University, Bloomington, Indiana. 

Prabhu, Narahari Umanath, M.Sc. (Univ. of Manchester), Graduate Assistant, Dept. of 
Mathematics, Univ. of Western Australia, Nedlands, Western Australia; Dept. of 
Statistics, Karnatak University, Dharwar, (Mysore State), India. 
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Rao, Vishvember P., M.A. (Univ. of Bombay), Lecturer in Statistics, College of Science, 
Nagpur, India. 

Roche, John E., Ph.D. (Univ. of Texas), Associate Professor, Dept. of Business Adminis- 
tration, Francis Hall, Texas A & M College, College Station, Texas. 

Rove, Gene, B.S. (Drexel Inst. of Tech.), Engineer, Gruen Industries, Time Hill, Cincin- 
nati, Ohio; 3671 Meadville Drive, Sherman Oaks, California. 

Scarf, Herbert E., Ph.D. (Princeton Univ.), Assistant Professor, Dept. of Statistics, Stan- 
ford University, Stanford, California. 

Schmetterer, Leopold Karl, Dr. rer. nat. (University Wien), Professor and Director, Insti- 
tut fur Versicherungs-mathematik und Mathematische Statistik, der Universitat 
Hamburg, Harvestehuderweg 10, Hamburg 13, Germany. 

Schumann, D. E. W., Ph.D. (Virginia Polytechnic Institute), Professor, Dept. of Statistics, 
University of Stellenbosch, Stellenbosch, South Africa. 

Searle, Shayle Robert, M.A. (Univ. of New Zealand), Research Statistician, New Zealand 
Dairy Board, P. O. Box 866, Wellington, New Zealand, and Student, Cornell University, 
Ithaca, New York; Wing Hall, Cornell Univ., Ithaca, N.Y. 

Stoker, D. J., Doctor (Univ. of Amsterdam), Senior Lecturer, Dept. of Statistics, University 
of Pretoria, Pretoria, Union of South Africa. 

Strotz, Robert H., Ph.D. (Univ. of Chicago), Associate Professor, Dept. of Economics, 
Northwestern University, Evanston, Illinois. 

Watterson, Geoffrey A., B.A. (Melbourne Univ.), Student, Australian National University, 
Box 4, G.P.O., Canberra, A.C.T., Australia. 

Witting, Hermann, Dr. rer. nat. (Mathematisches Institut, Universitat Freiburg), Dozent 
Institut fur Angewandte Mathematik, Universitat Freiburg, Freiburg i Br., Germany; 
Dreikonigstrasse 9, Freiburg i Br., Germany. 


(RR a 


SMU Computing Laboratory 


Southern Methodist University announces the opening of a Computing Lab- 
oratory on its campus. A new building houses the Univac Scientific 1103 Com- 
puter, the Remington Rand Service Bureau and the 8.M.U. Computing Lab- 
oratory offices and classrooms. The computer is operated jointly by Remington 
Rand as a service to industry and by 8.M.U. as an academic service for research 
and teaching. : 

The S.M.U. operation is associated with the University’s new Graduate Re- 
search Center. Professors and students have free use of the machine for academic 
research and training in computer work. Training programs are available for 
faculty and students. Computing projects are now underway in the fields of 
engineering, mathematics, psychology, law, religion, management and others. 
S.M.U. will make the computer arrangement involving only a nominal fee for 
overhead, and invites inquiries leading to such use of the machine. S.M.U. re- 
gards its laboratory as a regional university computing facility. 


oo OOO 


Doctoral Dissertations in Statistics, 1957 


Listed below are doctorates conferred during the year 1957 in the United 
States and Canada for which the dissertations were written on topics in statistics 
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or related fields. The university, major subject, and the title of the dissertation 
are given in each case. Readers are invited to notify the Editor of any omissions 
from this list. 


Israel Jacob Abrams, University of California, Berkeley, major in statistics, ‘““Contribu- 
tions to the Stochastic Theory of Inventory.” 

Satya Prakash Agarwal, University of California, Berkeley, major in statistics, ‘On the 
Asymptotic Equivalence of Two Classes of Tests of a Multiparameter Hypothesis.”’ 

Willard O. Ash, Virginia Polytechnic Institute, major in statistics, ‘Randomized Es- 
timates in Power Spectral Analysis.”’ 

Glenn E. Bartsch, Harvard, major in biostatistics, ‘Confidence Intervals for the Means 
of Non-Normal Populations.”’ 

Bradley Dean Bucher, Princeton, major in statistics, ‘“The Recovery of Inter-Variety 
Information in Incomplete Block Design.’’ 

R. L. Carter, University of North Carolina, ‘‘New designs for the exploration of response 
surfaces.” 

D. L. Clark, Oregon State College, minor in physics, ‘‘The distribution of linear func- 
tionals of stochastic processes.’ 

Leonard Cohen, Columbia, major in mathematical statistics, “On Mixed Single Sample 
Experiments.”’ 

T. G. Donnelly, University of North Carolina, ‘‘A family of sequential tests.’ 

James D. Esary, University of California, Berkeley, major in statistics, ‘‘A Stochastic 
Theory of Accident Survival and Fatality.” 

Alfred E. Garrett, Virginia Polytechnic Institute, major in statistics, ‘‘Estimation 
Problems Connected with Stochastic Processes.”’ 

Edmund Alpheus Gehan, North Carolina State College, major in experimental statistics, 
“Some Statistical Methods Applicable to Studies of Infrequent Diseases with Application 
to a Study of Rheumatic Heart Disease in North Carolina.”’ 

Nathaniel Roy Goodman, Princeton, major in statistics, ‘“The Joint Estimation of the 
Spectra, Co-Spectrum and Quadrature Spectrum of a Two-Dimensional Stationary Gaus- 
sian Process.” 

Ramanathan Gnanadesikan, University of North Carolina, ‘‘Contributions to multi- 
variate analysis including univariate and multivariate variance components analysis and 
factor analysis.” 

Geoffrey Gregory, Stanford, major in statistics, ‘‘An Economic Approach to the Choice 
of Continuous Sampling Plans.”’ 

Mary I. Hanania, University of California, Berkeley, major in statistics, ‘‘Some Statisti- 
cal Tests of Hypotheses in Learning Theory.”’ 

Leon H. Herbach, Columbia, major in mathematical statistics, “Optimum Properties of 
Analysis of Variance Tests Based on Model II and Some Generalizations of Model II. 

Mohammad Iqbal, University of North Carolina, ‘‘On the classification statistic of 
Wald.” 

Philip G. Johnson, University of Minnesota, major in mathematics; minor statistics, 
‘Shares of Functions with Values in a Banach Algebra.”’ 

Gordon H. Josie, Johns Hopkins University, major in biostatistics, “Sampling Varia- 
tion as a Factor in Morbidity Survey Design.”’ 

J. W. Lamperti, California Institute of Technology, minor in philosophy, ‘‘On the asymp- 
totic behavior of recurrent and ‘almost recurrent’ events.”’ 

Robert James Lundegard, Purdue, major in mathematical statistics, ‘Identification and 
Estimation into Stochastic Models.” 

Leo Lynch, Virginia Polytechnic Institute, major in statistics, ‘“On the Analysis of 
Paired Ranked Observations.” 

William Mendenhall, North Carolina State College, major in experimental statistics, 





NEWS AND NOTICES 63 


‘Estimation of Parameters of Mixed Exponentially Distributed Failure Time Distributions 
from Censored Life Test Data.’’ 

Paul Dixon Minton, North Carolina State College, major in experimental statistics, 
“Some Distributions Related to Column Totals in Sociometric Matrices.’’ 

Robert Dean Morrison, North Carolina State College, major in experimental statistics, 
“Some Studies on the Estimates of the Exponents in Models Containing One and Two Ex- 
ponentials.”’ 

Vasant Lakshman Mote, North Carolina State College, major in experimental statistics, 
“An Investigation of the Effect of Misclassification on the x? Tests in the Analysis of Cate- 
gorical Data.”’ 

Mangalore V. Pai, Purdue, major in statistics, ‘‘Comparisons of the Methods of Classi- 
fication.’’ 

Robert N. Pendergrass, Virginia Polytechnic Institute, major in statistics, “The Rank 
Analysis of Triple Comparisons.”’ 

Robert Richard Read, University of California, Berkeley, major in statistics, ‘‘Contri- 
butions to the Statistical Theory of Cloud Chamber Data.” 

Robert H. Riffenburgh, Virginia Polytechnic Institute, major in statisti«s, ‘“Linear Dis- 
criminant Analysis.” 

Judah I. Rosenblatt, Columbia, major in mathematical statistics, ‘“Goodness-of-Fit 
Tests for Approximate Hypotheses.”’ 

Mandakini Janardan Sane, University of California, Berkeley, major in statistics, “I. 
Locally Unbiased Tests of Composite Hypotheses with s Contraints.”’ 

James H. Stapleton, Purdue, major in statistics, ‘‘On the Theory of Asymptotic Distri- 
butions (mod 1) and Its Extension to Abstract Spaces.” 

James G. C. Templeton, Princeton, major in statistics, ‘‘A Test for Detecting Single-cell 
Disturbances in Contingency Tables.” 

J. B. Tysver, University of Michigan, ‘‘Inherent errors in matrices with statistical 
applications.” 

Dirk van der Reyden, North Carolina State College, major in experimental statistics, 
“The Use of Orthogonal Polynomial Contrasts in the Confounding of Factorial Experi- 
ments.’’ 

J. W. Walker, University of North Carolina, ‘‘Optimal decomposition of a sample space 
for estimations based on grouped data.”’ 

David Zeitlen, University of Minnesota, major in mathematics; minor statistics, ‘‘Be- 
havior of Conformal Maps under Analytic Deformation of the Domain.” 

Alexis Zinger, University of Montreal, major in mathematical statistics, ‘On the Choice 
of the Best Amongst Three Normal Populations with Known Variances.” 


eR 


REPORT OF THE GATLINBURG, TENNESSEE MEETING 
OF THE INSTITUTE 


The 1958 Eastern Regional Meeting, seventy-seventh meeting of the Institute 
of Mathematical Statistics, was held in Gatlinburg, Tennessee on April 10-12, 
1958, in conjunction with the Biometric Society (Eastern North American 
Region). 


The following 118 members of the Institute registered for the meeting: 


G. E. Albert, Mrs. G. E. Albert, G. N. Alexander, Williard O. Ash, G. J. Atta, Earl 
Atwood, R. E. Bargman, V. P. Bhapkar, Allan Birnbaum, Barbara Bishop, C. I. Bliss, 
V. J. Bofinger, R. C. Bose, G. E. P. Box, R. A. Bradley, A. E. Brandt, Ellis Brenna, W. 
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N. Carey, Jr., Charles Carroll, D. Chaudhuri, Victor Chew, Mary Ann Cipolloni, A. Bruce 
Clarke, W. H. Clatworthy, Charles W. Clunies-Ross, A. C. Cohen, W. D. Commins, W. E. 
Conner, William S. Connor, Dennis Cooke, Richard Cornell, Jerome Cornfield, L. C. A. 
Corsten, Constance Cox, Edwin L. Cox, Gertrude M. Cox, J. B. Cox, Elliot M. Cramer, 
Jonas M. Dalton, H. A. David, Earl Diamond, N. Draper, J. R. Duffett, David Duncan, 
Arthur M. Dutton, Lila Elveback, Jean Engler, A. L. Finkner, Jack Fleischer, 8. M. Free, 
Rudolph J. Freund, Donald A. Gardiner, John J. Gart, A. Garzadela, E. Gehan, S. Geisser 
H. Ginsburg, W. Glenn, R. Gnanadesikian, Arnold Grandage, J. E. Grizzle, Joan M. Gur- 
ian, M. A. Guzman, R. J. Hader, W. Haenszel, W. J. Hall, Eugene K. Harris, Boyd Harsh 
barger, Hubert Hill, Wassily Hoeffding, R. G. Hoffman, Harold Hotelling, W. H. Horton, 
J. F. Hudson, David Hurst, Paul E. Irick, M. A. Kastenbaum, Therese Kelleher, George 
Kennedy, Allyn W. Kimball, Marcus Kjelsberg, Carl F. Kossack, Roy R. Kuebler, R. G 
Laha, E. L. LeClerg, G. J. Levenbach, H. L. Lucas, Eugene Lukacs, Mary Lum, Nathan 
Mantel, Frank Martin, Margaret P. Martin, P. A. Miller, D. F. Morrison, George Mor- 
ton, V. K. Murthy, M. D. Nefzger, George E. Nicholson, Jr., Paul S. Olmstead, D. Quade, 
H. F. Robinson, A. C. Rohloff, J. B. Roy, Charles F. Sarle, M. A. Schneiderman, Oliver A. 
Shaw, S. 8. Shrikhande, Paul Somerville, D. E. South, Harold Storz, R. J. Taylor, George 
W. Thomson, Malcolm Turner, Ronald E. Walpole, G. S. Watson, M. B. Wilk, E. J. 
Williams, R. Lowell Wine. 


The program of the meeting was as follows: 


THURSDAY, APRIL 10, 1958 


9:00-10:00 A.M.—Registration 
9:00-10:00 A.M.—Biometric Society Regional Advisory Board Meeting 
10:00—12:00 A.M.—Non-Linear Estimation 


Chairman: R. Lowetu Wine, Hollins College 
1. Some Recent Work on Non-Linear Estimation and Design, G. E. P. Box, Princeton 
University 
2. Estimation for Linear Combinations of Exponentials, R. G. Cornett, United States 
Public Health Service 
3. Non-Linear Hypotheses, M. B. W1ik, Bell Telephone Laboratories 


1:45-3:45 P.M.—Design of Experiments 


Chairman: W. M. CLatwortuy, Westinghouse Electric Corporation 
1. Use of the Direct Product of Matrices in the Analysis of Factorials, W. S. Gonnor, 
National Bureau of Standards and R. C. Boss, University of North Carolina 
2. Analysis of Variance of a Randomized Block Design with Missing Observations, W. A. 
GLENN and CLypE Y. Kramer, Virginia Polytechnic Institute 


4:00-6:00 P.M.—Contributed Papers I (I.M.S. 


Chairman: Dubey Sours, University of Florida 

1. An Upper Bound for the Variance of Certain Statistics, Wasstty HoerrpinG, Uni 
versity of North Carolina 

2. A Markov Chain Resulting from a Certain Sorting Problem, A. Bruce CuLark, Uni- 
versity of Michigan 

3. Second Order Rotatable Designs in Three or More Factors, R. C. Bos—e and NoRMAN 
R. Draper, University of North Carolina 

An Optimum Property of Some Bechhofer-Type Non-Sequential Multiple-Decision 

Rules, WILLIAM JACKSON HALL, University of North Carolina 
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5. Useful Bayes Solutions for Multiple Comparisons Problems. I. Preliminary Report, 
Davip B. Duncan, University of North Carolina 

6. On the Laws of Cauchy and Gauss, R. G. Lana, Catholic University 

Tests on a Variance-Covariance Matriz, NaTtHAN MANTEL, National Cancer Insti- 

tute, and Seymour Getsser, National Institute of Mental Health 

8. On the Simple von Neumann Model of Dynamic Economic Equilibrium as a Markov 
Chain (Preliminary Report), Davip RosENBLatTT, American University (By title) 

9. On a Test for the Equality of Several Means, D. V. RaMACHANDRAM, Demographic 
Training and Research Centre, Bombay, India (By title) 

10. Ona Test for the Equality of Several Variances, K. V. RaMACHANDRAM, Demographic 
Training and Research Centre, Bombay, India (By Title) 

11. Fitting the Logistic by Maximum Likelihood, J. LL. Hopces, Jr., University of Cali 
fornia, Berkeley (By Title) 


8:00-9:00 P.M.—-Special Invited Address (The Biometric Society) 


Chairman: A. W. Krmpaut, Oak Ridge National Laboratory 
The First Decade of the Biometric Society, C. I. Buiss, Connecticut Agricultural 
Experiment Station and Yale University 


9:00 P.M.--Smoker 


Host: The Union Carbide Nuclear Company 


FRIDAY, APRIL 11, 1958 
8:30—-10:30 A.M.— Special Topics I 


Chairman: DonaLp A. GarpINER, Oak Ridge National Laboratory 
1. Some Topics in the Analysis of Contingency Tables, V. L. More, M. V. PavaTe and 
R. L. AnpErson, North Carolina State College. (Presented by E. J. Williams, 
A. Grandage and V. J. Bofinger) 
2. Optimum Allocation for Estimation of Polynomial Regression, E. J. Wittiams, North 
Carolina State College 


10:45 A.M.—12:15 P.M.—Contributed Papers II (The Biometric Society) 


Chairman: Jack FLEe1scHEer, North Carolina State College 
1. Triangle, Duo-Trio, and Difference-From-Control Tests in Taste Testing, RALPH A. 
BraD.LeEY, Virginia Polytechnic Institute. 
A Sequential Decision Procedure for Comparing Survival Curves, Joun J. Gant, Oak 
Ridge National Laboratory and Virginia Polytechnic Institute 
3. On Problems in Residual Analysis, RupotFr J. FREuNp and Ricnarp W. ValIL, JR., 
Virginia Polytechnic Institute 


9 


4. Mixed Exponential Failure Distributions, C. W. CLuniEs-Ross, Virginia Polytechnic 
Institute 
5. Estimation of System Reliability from Component Reliabilities, James R. Durrett 


Virginia Polytechnic Institute 


1:45-3:45 P.M.—Applications in the Physical Sciences 


Chairman: Cart Kossack, Purdue University 
Statistical Activity in the AASHO Road Test, W. N. Carey, Jr., Chief Engineer for 
Research, and P. E. Irtcx, Chief, Data Analysis Branch, Highway Research 
Board, American Association of State Highway Officials Road Test 
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4:00-6:00 P.M.—Statistical Genetics 


Chairman: H. F. Rosinson, North Carolina State College 

1. Estimation of Sperm Frequencies in Drosophila, M. A. KastenBauM, Oak Ridge 
National Laboratory 

2. Adaptation of High-Speed Computing Machines for Empirical Selection Studies, F. G. 
MartTIN, Jr., and C. C. Cockeruam, North Carolina State College 

3. Estimation and Use of Genotype-Environmental Interaction Components of Variance 
in Cotton Breeding, P. A. Miuuer, J. C. WituraMs and H. F. Rosinson, North 
Carolina State College 

4. A Synthesis of Diallel Cross Methodology, THERESE KELLEHER, United States De 
partment of Agriculture 


8:00—10:00 P.M.—Special Invited Address (I.M.S.) 


Chairman: Haroip HoTe.uina, University of North Carolina 
On the Construction of Error Detecting and Error Correcting Binary Codes, R. C 
Bose, University of North Carolina 


SATURDAY, APRIL 12, 1958 
8:30-10:30 A.M.—-Special Topics II 


Chairman: Boyp HarsuBarGeEr, Virginia Polytechnic Institute 
1. Paired Comparisons and Tournaments, H. A. Davin, Virginia Polytechnic Institute 
2. Dependence in Multivariate Analysis, R. E. BARGMANN, Virginia Polytechnic Insti- 
tute. 


(a 


PUBLICATIONS RECEIVED 


Sreck, G. P., Upper Confidence Limits for the Failure Probability of Complex Networks, 
SC-4133(TR), Sandia Corporation, Albuquerque, New Mexico, $1.25. 

Boletin de la Sociedad Matematica Mexicana, Sociedad Matematica Mexicana, Tacuba 5, 
Mexico 1, D. F. Mexico. 

Anuario Estadistico de Espana, 1957, Presidencia del Gobierno, Instituto Nacional de 
Estadistica, Ferraz 41, Madrid, Spain. 

DvuBors, Putuip H., Multivariate Correlational Analysis, Harper and Brothers, 49 East 33D 
Street, New York 16, New York. pp. vii-202. $4.50. 

Eves AND Newsom, The Foundations and Fundamental Concepts of Mathematics, Rinehart 
and Company, Inc., 232 Madison Ave., New York 16, New York. ix-363, $6.75. 

DorFMaAN, Ronert, Paut A. SAMUELSON AND Rosert M. Sotow, Linear Programming 
and Economic Analysis, McGraw-Hill Book Company, 330 West 42nd Street, New York 
36, New York. v-527, $10.00. 
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